Shape error only when TPU training Keras model

Shape error only when TPU training Keras model - python

First off, this is not my code. I just changed it to be able to train it on TPU. The original author is here. I am able to run it on the GPU accelerated runtime on a collaboratory notebook but it seems to break when I do TPU accelerated runtime.
Here is my notebook. It just give me an error that the activation function is not the right size.
ValueError: Error when checking target: expected activation_21 to have shape (1,) but got array with shape (205,)
I would appreciate any help I can get as I spent like 3 hours debugging.

Since you are one-hot encoding the labels and therefore they are not sparse, you need to use 'categorical_accuracy' as the metric:
model.compile(..., metrics=['categorical_accuracy'])
or more succinctly use 'accuracy' to let Keras infer the right metric based on the loss function used (which in this case would be 'categorical_accuracy' since you are using categorical_crossentropy as the loss function):
model.compile(..., metrics=['accuracy'])

Related

How to avoid defining target tensors in Tensorflow 2 for CTC loss model?

I am trying to use tf.distribute.MirroredStrategy() for multi GPU training in Tensorflow 2, on a model with CTC loss.
Problem is that model needs defining target_tensors in order to compile.
What can be the cause of that?
Is there some workaround and compile model without defining target_tensors?
If I do not pass the targets I get the following:
TypeError: Value passed to parameter 'indices' has DataType float32 not in list of allowed values: uint8, int32, int64
The model is defined using Keras functional API by using something like:
model = Model(name ='Joined_Model_2',inputs=self.inp, outputs=[self.network.outp, self.network.outp_stt])
The model must be compiled as:
self.model_joined.compile(optimizer=optimizer_stt,
loss=losses,
loss_weights= lossWeights,
target_tensors=[target1, target2]
)
The model has 2 outputs, but the CTC loss used on the second one is causing the problem.

This is solved by using tf-nightly version.
Tf-nightly doesn't allow using target_tensors in eager execution mode.
With nightly version my model successfully compiled without target tensors (no changes in implementation), so the problem is solved.

How to handle model.compile calling loss and metric functions with empty data

Using tensorflow nightly (2.0), I have custom losses and metrics in my call to model.compile. When running with:
tf.config.experimental_run_functions_eagerly(True)
everything works fine. If I don't turn on experimental eager execution, for some reason calling:
self._model.compile(optimizer="Adam",
loss=[
balanced_cross_entropy,
intersection_over_union,
angle_loss
],
metrics=[
[image_logging_metric('RBOX Score Map')],
[image_logging_metric('RBOX Shapes')],
[image_logging_metric('RBOX Angles')],
])
calls all of my loss and metric function with empty tensors that have dimensions which don't match the shape of my expected inputs. I can't find any documentation about writing loss and metrics differently for graph mode, and don't understand why they are called as part of compilation.
One other thing to note is I have dynamic input shape (None, None, None, 3), and I'm guessing that's why the dimensions passed to my functions are unexpectedly small, but the unspecified shape is intentional and working in eager execution, as everything is scaled with convolutions.
So I'm wondering, why are losses and metrics being called on compile, and is there an intended way to handle this situation?

Figured out I just needed to pass target_tensors with desired shapes to compile even though I'm using fit_generator and a Sequence.

Training with mixed-precision using the tensorflow estimator api

Does anyone has experience with mixed-precision training using the tensorflow estimator api?
I tried casting my inputs to tf.float16 and the results of the network back to tf.float32. For scaling the loss I used tf.contrib.mixed_precision.LossScaleOptimizer.
The error messages I get are relatively uninformative: "Tried to convert 'x' to a tensor and failed. Error: None values not supported",

I found the issue: I used tf.get_variable to store the learning rate. This variable has no gradient. Normal optimizers do not care, but tf.contrib.mixed_precision.LossScaleOptimizer crashes. Therefore, make sure these variables are not added to tf.GraphKeys.TRAINABLE_VARIABLES.

Training broke with ResourceExausted error

I am new to tensorflow and Machine Learning. Recently I am working on a model. My model is like below,
Character level Embedding Vector -> Embedding lookup -> LSTM1
Word level Embedding Vector->Embedding lookup -> LSTM2
[LSTM1+LSTM2] -> single layer MLP-> softmax layer
[LSTM1+LSTM2] -> Single layer MLP-> WGAN discriminator
Code of he rnn model
while I'm working on this model I got the following error. I thought My batch is too big. Thus I tried to reduce the batch size from 20 to 10 but it doesn't work.
ResourceExhaustedError (see above for traceback): OOM when allocating
tensor with shape[24760,100] [[Node:
chars/bidirectional_rnn/bw/bw/while/bw/lstm_cell/split =
Split[T=DT_FLOAT, num_split=4,
_device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients_2/Add_3/y,
chars/bidirectional_rnn/bw/bw/while/bw/lstm_cell/BiasAdd)]] [[Node:
bi-lstm/bidirectional_rnn/bw/bw/stack/_167 =
_Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0",
send_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device_incarnation=1,
tensor_name="edge_636_bi-lstm/bidirectional_rnn/bw/bw/stack",
tensor_type=DT_INT32,
_device="/job:localhost/replica:0/task:0/device:CPU:0"]]
tensor with shape[24760,100] means 2476000*32/8*1024*1024 = 9.44519043 MB memory. I am running the code on a titan X(11 GB) GPU. What could go wrong? Why this type of error occurred?
* Extra info *: the size of the LSTM1 is 100. for bidirectional LSTM it becomes 200.
The size of the LSTM2 is 300. For Bidirectional LSTM it becomes 600.
*Note *: The error occurred after 32 epoch. My question is why after 32 epoch there is an error. Why not at the initial epoch.

I have been tweaking a lot these days to solve this problem.
Finally, I haven't solved the mystery of the memory size described in the question. I guess while computing the gradient tensoflow accumulate a lot of additional memory for computing gradient. I need to check the source of the tensorflow which seems very cumbersome at this time. You can check how much memory your model is using from terminal by the following command,
nvidia-smi
judging from this command you can guess how much additional memory you can use.
But the solution to these type of problem lies on reducing the batch size,
For my case reducing the size of the batch to 3 works. This may vary
model to model.
But what if you are using a model where the embedding matrix is much bigger that you cannot load them into memory?
The solution is to write some painy code.
You have to lookup on the embedding matrix and then load the embedding to the model. In short, for each batch, you have to give the lookup matrixes to the model(feed them by the feed_dict argument in the sess.run()).
Next you will face a new problem,
You cannot make the embeddings trainable in this way. The solution is to use the embedding in a placeholder and assign them to a Variable(say for example A). After each batch of training, the learning algorithm updates the variable A. Then compute the output of A vector by tensorflow and assign them to your embedding matrix which is outside of the model. (I said that the process is painy)
Now your next question should be, what if you cannot feed the embedding lookup to the model because it's so big. This is a fundamental problem that you cannot avoid. That's why the NVIDIA GTX 1080, 1080ti and NVIDA TITAN Xp have so price difference though NVIDIA 1080ti and 1080 have the higher frequency to run an execution.

*Note *: The error occurred after 32 epoch. My question is why after 32 epoch there is an error. Why not at the initial epoch.
This is a major clue that the graph is not static during execution. By that I mean, you're likely doing sess.run(tf.something) instead of
my_something = tf.something
with tf.Session() as sess:
sess.run(my_something)
I ran into the same problem trying to implement a stateful RNN. I would occasionally reset the state, so I was doing sess.run([reset if some_condition else tf.no_op()]). Simply adding nothing = tf.no_op() to my graph and using sess.run([reset if some_condition else nothing]) solved my problem.
If you could post the training loop, it would be easier to tell if that is what's going wrong.

I also faced the same problem while training the conv-autoencoder model. Solved it by reducing the batchsize. My earlier batch size was 64. To solve this error, I reduced it to 32 and it worked!!!

Replicating results with different artificial neural network frameworks (ffnet, tensorflow)

I'm trying to model a technical process (a number of nonlinear equations) with artificial neural networks. The function has a number of inputs and a number of outputs (e.g. 50 inputs, 150 outputs - all floats).
I have tried the python library ffnet (wrapper for a fortran library) with great success. The errors for a certain dataset are well below 0.2%.
It is using a fully connected graph and these additional parameters.
Basic assumptions and limitations:
Network has feed-forward architecture.
Input units have identity activation function, all other units have sigmoid activation function.
Provided data are automatically normalized, both input and output, with a linear mapping to the range (0.15, 0.85). Each input and output is treated separately (i.e. linear map is unique for each input and output).
Function minimized during training is a sum of squared errors of each output for each training pattern.
I am using one input layer, one hidden layer (size: 2/3 of input vector + size of output vector) and an output layer. I'm using the scipy conjugate gradient optimizer.
The downside of ffnet is the long training time and the lack of functionality to use GPUs. Therefore i want to switch to a different framework and have chosen keras with TensorFlow as the backend.
I have tried to model the previous configuration:
model = Sequential()
model.add(Dense(n_hidden, input_dim=n_in))
model.add(BatchNormalization())
model.add(Dense(n_hidden))
model.add(Activation('sigmoid'))
model.add(Dense(n_out))
model.add(Activation('sigmoid'))
model.summary()
model.compile(loss='mean_squared_error',
optimizer='Adamax',
metrics=['accuracy'])
However the results are far worse, the error is up to 0.5% with a few thousand (!) epochs of training. The ffnet training was automatically canceled at 292 epochs. Furthermore the differences between the network response and the validation target are not centered around 0, but mostly negative.
I have tried all optimizers and different loss functions. I have also skipped the BatchNormalization and normalized the data manually in the same way that ffnet does it. Nothing helps.
Does anyone have a suggestion to obtain better results with keras?

I understand you are trying to re-train the same architecture from scratch, with a different library. The first fundamental issue to keep in mind here is that neural nets are not necessarily reproducible, when weights are initialized randomly.
For example, here is the default constructor parameter for Dense in Keras:
init='glorot_uniform'
But even before trying to evaluate the convergence of Keras optimizations, I would recommend trying to port the weights for which you got good results, from ffnet, into your Keras model. You can do so either with the kwarg Dense(..., weights=) of each layer, or globally at the end model.set_weights(...)
Using the same weights must yield the exact same result between the two libs. Unless you run into some floating point rounding issues. I believe that as long as porting the weights is not consistent, working on the optimization is unlikely to help.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.