Related
I'm trying to get the predictions inside the function on_epoch_end of keras' Callback.
At the moment, to get the predictions, I execute self.model.predict with batch_size of 2, but at the 3rd epochs I get this error:
RuntimeError: Dst tensor is not initialized in Tensorflow
Reading on the web, I notice that this error appears when the GPU goes out of memory. In my case, reading the stack trace, this error is triggered by self.model.predict inside on_epoch_end, it says:
File "mlp_keras.py", line 20, in on_epoch_end predictions =
self.model.predict(self.dataset)
This is the full stack trace:
Traceback (most recent call last):
File "mlp_keras.py", line 150, in <module>
callbacks=[KendallTauHistory(training_dataset, training_dataset_labels, groups_id_count)])
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 397, in fit
prefix='val_')
File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 771, in on_epoch
self.callbacks.on_epoch_end(epoch, epoch_logs)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/callbacks.py", line 302, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "mlp_keras.py", line 20, in on_epoch_end
predictions = self.model.predict(self.dataset)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1013, in predict
use_multiprocessing=use_multiprocessing)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 498, in predict
workers=workers, use_multiprocessing=use_multiprocessing, **kwargs)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 426, in _model_iteration
use_multiprocessing=use_multiprocessing)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 706, in _process_inputs
use_multiprocessing=use_multiprocessing)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 357, in __init__
dataset = self.slice_inputs(indices_dataset, inputs)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 383, in slice_inputs
dataset_ops.DatasetV2.from_tensors(inputs).repeat()
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 566, in from_tensors
return TensorDataset(tensors)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2765, in __init__
element = structure.normalize_element(element)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/data/util/structure.py", line 113, in normalize_element
ops.convert_to_tensor(t, name="component_%d" % i))
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
return constant_op.constant(value, dtype, name=name)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
allow_broadcast=True)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 266, in _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/usr/home/studenti/sp171412/word_ordering/mlp/env/lib/python2.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
RuntimeError: Dst tensor is not initialized.
My question is: is there a way to get the predictions without performing predict inside on_epoch_end? Thanks in advance.
Alright, after seeing your last comment, what you could do:
epochs = 100
for epoch in range(epochs):
model.fit(x_train, y_train)
y_predict = model.predict(x_test)
This is the model that I built. Please do help me understand if the problem with my model or any other problem I am facing this issue.
The error occurs after this:
Train on 63828 samples, validate on 95743 samples
Epoch 1/1
63744/63828 [============================>.] - ETA: 2s - loss: 0.3427 - acc: 0.9943
The error occurs at the end. So I removed the avlidation set during training.
from tensorflow.python.keras.layers import Embedding, Input
from tensorflow.python.keras.layers import LSTM, Bidirectional, GlobalMaxPool1D, Dropout
embedding_layer = Embedding(num_of_words, EMBEDDING_DIM, weights=[embedding_matrix], input_length=MAX_SEQUENCE_LENGTH, trainable=False)
#building the model
#INPUT LAYER
input_layer = Input((MAX_SEQUENCE_LENGTH,))
#EMBEDDING LAYER
embedding_layer = embedding_layer(input_layer)
#BI-LSTM LAYER
lstm_layer_output = Bidirectional(LSTM(128, return_sequences=True))(embedding_layer)
lstm, forward_h, forward_c, backward_h, backward_c = Bidirectional \
(LSTM
(128,
dropout=0.2,
return_sequences=True,
return_state=True,
recurrent_activation='relu',
recurrent_initializer='glorot_uniform'))(embedding_layer)
from tensorflow.python.keras import backend as K
#CNN LAYER WITH KERNELS 3,4,5
from tensorflow.python.keras.layers import Conv1D, MaxPooling1D
first_conv_layer = Conv1D(128, 3, activation='relu')(lstm_layer_output)
first_max_pooling_layer = MaxPooling1D(3)(first_conv_layer)
second_conv_layer = Conv1D(128, 4, activation='relu')(first_max_pooling_layer)
second_max_pooling_layer = MaxPooling1D(4)(second_conv_layer)
third_conv_layer = Conv1D(128, 5, activation='relu')(second_max_pooling_layer)
#third_max_pooling_layer = MaxPooling1D(5)(third_conv_layer)
global_max_pooling = GlobalMaxPool1D()(third_conv_layer)
#from tensorflow.python.keras.layers import Concatenate
#merged_pooling_layers = Concatenate(axis=1)([first_max_pooling_layer,second_max_pooling_layer,third_max_pooling_layer])
#global_max_pooling = GlobalMaxPool1D()(merged_pooling_layers)
#implementing attentionlayer manually
from tensorflow.python.keras.layers import Add
rnn_output = Add()([forward_h,backward_h])
hidden_size = int(lstm.shape[2])
from tensorflow.python.keras.layers import Lambda
hsf = Lambda(lambda x: x[:, -1], output_shape=(hidden_size,), name='last_hidden_state_forward')(rnn_output)
from tensorflow.python.keras.layers import Multiply
from tensorflow.python.keras.layers import Lambda
def norm(m):
return K.transpose(m)
u_t = Multiply()([Lambda(norm)(rnn_output),hsf])
context_vector = Multiply()([u_t,global_max_pooling])
def ex(m):
return K.exp(context_vector)
exp_u_t = Lambda(ex)(context_vector)
from tensorflow.python.keras.layers import Dense
attention_vector = Dense(128,activation='softmax')(exp_u_t)
x = Dense(64,activation="softmax")(weighted_input)
output_layer = Dense(6,activation="softmax")(x)
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.optimizers import Adam
model = Model(input_layer,output_layer)
from tensorflow.python.keras import optimizers
model.compile(
loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy']
)
print('Training model...')
r = model.fit(
data,
target_values,
batch_size=128,
epochs=1,
validation_split=0.0
)
The error I got is this:
InvalidArgumentError (see above for traceback): Incompatible shapes: [84,6] vs. [128,6]
[[Node: training/SGD/gradients/loss/dense_3_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:#training/SGD/gradients/loss/dense_3_loss/mul_grad/Reshape_1"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](training/SGD/gradients/loss/dense_3_loss/mul_grad/Shape, training/SGD/gradients/loss/dense_3_loss/mul_grad/Shape_1)]]
Please help me fix this problem.Thank you
Edit:
This is the traceback of the error
Epoch 1/1
31872/31914 [============================>.] - ETA: 1s - loss: 0.2419 Traceback (most recent call last):
File "<ipython-input-1-a7cc2e59a772>", line 165, in <module>
validation_split=0.8
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\engine\training.py", line 1216, in fit
validation_steps=validation_steps)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\engine\training_arrays.py", line 245, in fit_loop
outs = f(ins_batch)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\backend.py", line 2824, in __call__
fetches=fetches, feed_dict=feed_dict, **self.session_kwargs)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 900, in run
run_metadata_ptr)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run
run_metadata)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
InvalidArgumentError: Incompatible shapes: [128,6] vs. [42,6]
[[Node: training/SGD/gradients/loss/dense_3_loss/logistic_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:#training/SGD/gradients/loss/dense_3_loss/logistic_loss/mul_grad/Reshape"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](training/SGD/gradients/loss/dense_3_loss/logistic_loss/mul_grad/Shape, training/SGD/gradients/loss/dense_3_loss/logistic_loss/mul_grad/Shape_1)]]
Caused by op 'training/SGD/gradients/loss/dense_3_loss/logistic_loss/mul_grad/BroadcastGradientArgs', defined at:
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 268, in <module>
main()
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 264, in main
kernel.start()
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 478, in start
self.io_loop.start()
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tornado\ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2728, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2850, in run_ast_nodes
if self.run_code(code, result):
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-a7cc2e59a772>", line 165, in <module>
validation_split=0.8
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\engine\training.py", line 1216, in fit
validation_steps=validation_steps)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\engine\training_arrays.py", line 90, in fit_loop
model._make_train_function()
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\engine\training.py", line 572, in _make_train_function
params=self._collected_trainable_weights, loss=self.total_loss)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\optimizers.py", line 208, in get_updates
grads = self.get_gradients(loss, params)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\optimizers.py", line 114, in get_gradients
grads = K.gradients(loss, params)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\backend.py", line 2866, in gradients
loss, variables, colocate_gradients_with_ops=True)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 494, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 636, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 385, in _MaybeCompile
return grad_fn() # Exit early
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 636, in <lambda>
lambda: grad_fn(op, *out_grads))
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\ops\math_grad.py", line 874, in _MulGrad
rx, ry = gen_array_ops.broadcast_gradient_args(sx, sy)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 673, in broadcast_gradient_args
"BroadcastGradientArgs", s0=s0, s1=s1, name=name)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
op_def=op_def)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op 'loss/dense_3_loss/logistic_loss/mul', defined at:
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 268, in <module>
main()
[elided 16 identical lines from previous traceback]
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-a7cc2e59a772>", line 153, in <module>
optimizer='sgd',
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\engine\training.py", line 428, in compile
output_loss = weighted_loss(y_true, y_pred, sample_weight, mask)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\engine\training_utils.py", line 438, in weighted
score_array = fn(y_true, y_pred)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\losses.py", line 116, in binary_crossentropy
return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\keras\_impl\keras\backend.py", line 3448, in binary_crossentropy
return nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_impl.py", line 181, in sigmoid_cross_entropy_with_logits
relu_logits - logits * labels,
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 979, in binary_op_wrapper
return func(x, y, name=name)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1211, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 4758, in mul
"Mul", x=x, y=y, name=name)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
op_def=op_def)
File "C:\Users\JCMat\New\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Incompatible shapes: [128,6] vs. [42,6]
[[Node: training/SGD/gradients/loss/dense_3_loss/logistic_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:#training/SGD/gradients/loss/dense_3_loss/logistic_loss/mul_grad/Reshape"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](training/SGD/gradients/loss/dense_3_loss/logistic_loss/mul_grad/Shape, training/SGD/gradients/loss/dense_3_loss/logistic_loss/mul_grad/Shape_1)]]
The issue is that your last batch doesn't contain 128 rows, but only 84, since the length of your dataset isn't divisible without a remainder. Either try to adjust your code to allow for dynamic rows, or maybe try padding the last batch.
I'm trying to fine tune resnet50 in half precision mode without success. It seems there are parts of model which are not compatible with float16. Here is my code:
dtype='float16'
K.set_floatx(dtype)
K.set_epsilon(1e-4)
model = Sequential()
model.add(ResNet50(weights='imagenet', include_top=False, pooling='avg'))
and I get this error:
Traceback (most recent call last):
File "train_resnet.py", line 40, in <module>
model.add(ResNet50(weights='imagenet', include_top=False, pooling='avg'))
File "/usr/local/lib/python3.6/dist-packages/keras/applications/__init__.py", line 28, in wrapper
return base_fun(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras/applications/resnet50.py", line 11, in ResNet50
return resnet50.ResNet50(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras_applications/resnet50.py", line 231, in ResNet50
x = layers.BatchNormalization(axis=bn_axis, name='bn_conv1')(x)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 457, in __call__
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras/layers/normalization.py", line 185, in call
epsilon=self.epsilon)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 1864, in normalize_batch_in_training
epsilon=epsilon)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 1839, in _fused_normalize_batch_in_training
data_format=tf_data_format)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py", line 1329, in fused_batch_norm
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 4488, in fused_batch_norm_v2
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 626, in _apply_op_helper
param_name=input_name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'scale' has DataType float16 not in list of allowed values: float32
This was a reported bug and upgrading to Keras==2.2.5 solved the issue.
I have replicated a deep CNN from a research paper. When I originally constructed the model, I assumed that the batch size would be one. However, now that I have learned more about batch sizes, I want to use a batch size of 40.
Here is the Github Repository
This is a very deep network, so I will show a more basic version of the project below:
x = tf.placeholder(tf.float32, shape=[None, 7168])
y_ = tf.placeholder(tf.float32, shape=[None, 7168, 3])
#MANY CONVOLUTIONS OMITTED HERE
#one of many transpose convolutions, the 40 here is a change I made for the batch size
w = tf.Variable(tf.constant(1.,shape=[2,2,4,1,192]))
DeConnv1 = tf.nn.conv3d_transpose(layer1, filter = w, output_shape = [40,32,32,7,1], strides = [1,2,2,2,1], padding = 'SAME')
#I reshape the final convolution's batch size because I was getting errors
final = tf.reshape(final, [40, 7168])
#Accuracy and loss functions omitted because they do not deal with batch size
#Lastly, I train the model where a and be are size [40][7169][3] 40 is the batch size
train_step.run(feed_dict={x: a, y_: b, keep_prob: .5})
When I run the code from the repository, I get this error. What more changes do I need to make so that the batch size is 40?
Traceback (most recent call last):
File "<stdin>", line 31, in <module>
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2042, in run
_run_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4490, in _run_using_default_session
session.run(operation, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN Backward Data function launch failure : input shape([320,4,4,1,896]) filter shape([3,3,1,896,800])
[[Node: gradients/conv3d_2/Conv3D_grad/Conv3DBackpropInputV2 = Conv3DBackpropInputV2[T=DT_FLOAT, data_format="NDHWC", padding="VALID", strides=[1, 1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/conv3d_2/Conv3D_grad/Shape, conv3d_1/kernel/read, gradients/conv3d_2/BatchToSpaceND_grad/SpaceToBatchND)]]
Caused by op u'gradients/conv3d_2/Conv3D_grad/Conv3DBackpropInputV2', defined at:
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 343, in minimize
grad_loss=grad_loss)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 414, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in <lambda>
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_grad.py", line 82, in _Conv3DGrad
data_format=data_format),
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 1084, in conv3d_backprop_input_v2
data_format=data_format, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op u'conv3d_2/Conv3D', defined at:
File "<stdin>", line 2, in <module>
File "<stdin>", line 2, in conv3d_dilation
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/convolutional.py", line 809, in conv3d
return layer.apply(inputs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 671, in apply
return self.__call__(inputs, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 575, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/convolutional.py", line 167, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 835, in __call__
return self.conv_op(inp, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 499, in __call__
return self.call(inp, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 492, in _with_space_to_batch_call
result = self.op(input_converted, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 187, in __call__
name=self.name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 847, in conv3d
padding=padding, data_format=data_format, name=name)
InternalError (see above for traceback): cuDNN Backward Data function launch failure : input shape([320,4,4,1,896]) filter shape([3,3,1,896,800])
[[Node: gradients/conv3d_2/Conv3D_grad/Conv3DBackpropInputV2 = Conv3DBackpropInputV2[T=DT_FLOAT, data_format="NDHWC", padding="VALID", strides=[1, 1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/conv3d_2/Conv3D_grad/Shape, conv3d_1/kernel/read, gradients/conv3d_2/BatchToSpaceND_grad/SpaceToBatchND)]]
Try this:
shape = tf.shape(tf.reshape(x, [-1, 32, 32, 7, 1]))
DeConnv1 = tf.nn.conv3d_transpose(layer1, filter=w, output_shape=shape, strides=[1,2,2,2,1], padding='SAME')
final = tf.reshape(final, [-1, 7168])
This way you don't hard-code 40 in the model, but able to feed any batch size you want, including 40.
Best practice is to avoid hard coding the batch size in the graph. As discussed here (see "How do I build a graph that works with variable batch sizes?") you should specify shapes as [None, nx, ny, nz], and retrieve batch sizes using tf.shape(input)[0]. Also, for reshaping you can use a form like this: tf.reshape(input, [-1, nx, ny, nz]) where the -1 specifies that the batch dimension should be set to the appropriate size during runtime.
I'm kind of new to working with TensorFlow and my problem may be easy to solve, at least I hope so.
I'm trying to work with LSTMCell to predict the next label in a sequence.
Here is the code I'm using :
import tensorflow as tf
max_sequence_length = 1000
vector_length = 1
number_of_classes = 1000
batch_size = 50
num_hidden = 24
# Define graph
data = tf.placeholder(tf.int64, [None, max_sequence_length, vector_length])
# 0 must be a free class so that the mask can work
target = tf.placeholder(tf.int64, [None, max_sequence_length, number_of_classes + 1])
labels = tf.argmax(target, 2)
cell = tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True)
Then I try to get the real length of each sequence in the batch
no_of_batches = tf.shape(data)[0]
sequence_lengths = tf.zeros([batch_size])
for i in xrange(max_sequence_length):
data_at_t = tf.squeeze(tf.slice(data, [0,i,0],[-1,1,-1]))
t = tf.scalar_mul(i, tf.ones([batch_size]))
boolean = tf.not_equal(data_at_t, tf.zeros([no_of_batches, batch_size], dtype = tf.int64))
sequence_lengths = tf.select(boolean, t, sequence_lengths)
And finally I try to call tf.nn.dynamic_rnn :
outputs, state = tf.nn.dynamic_rnn(
cell = cell,
inputs = data,
sequence_length = max_sequence_length,
dtype = tf.float64
)
Then, I get a TypeError:
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 830, in dynamic_rnn
dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 997, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1973, in while_loop
result = context.BuildLoop(cond, body, loop_vars)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1860, in BuildLoop
pred, body, original_loop_vars, loop_vars)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1810, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 980, in _time_step
skip_conditionals=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 394, in _rnn_step
new_output, new_state = call_cell()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 968, in <lambda>
call_cell = lambda: cell(input_t, state)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 489, in __call__
dtype, self._num_unit_shards)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 323, in _get_concat_variable
sharded_variable = _get_sharded_variable(name, shape, dtype, num_shards)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 353, in _get_sharded_variable
dtype=dtype))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 830, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 673, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 217, in get_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 202, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 536, in _get_single_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 211, in __init__
dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 281, in _init_from_args
self._initial_value = ops.convert_to_tensor(initial_value(),
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 526, in <lambda>
init_val = lambda: initializer(shape.as_list(), dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py", line 210, in _initializer
dtype, seed=seed)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 235, in random_uniform
minval = ops.convert_to_tensor(minval, dtype=dtype, name="min")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 621, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 180, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 163, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.py", line 353, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.py", line 290, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int64, got -0.34641016151377546 of type 'float' instead.
I don't understand where this float comes from as all other values in the script are integers. How can I solve this problem ?
1) The RNN cell state is tf.float64. You set this explicitly within the tf.nn.dynamic_rnn call (dtype). The tensorflow engine then initialized the states with RNN's default random_uniform initializer. This is why you have the -0.34 float value there.
I'm not sure what you wanted to achieve. Please refer to https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#dynamic_rnn
2) The sequence_length = max_sequence_length must be an int32/int64 vector sized [batch_size] instead of scalar 1000
3) You may want to initialize the LSTMCell state too:
cell = tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True),
initializer=tf.constant_initializer(value=0, dtype=tf.int32))
i have a similar problem with my code, i fixed it by replacing the int32 at the placeholder as float, check if that works