Tensorflow's cycle gan model: error in fit method

Tensorflow's cycle gan model: error in fit method - python

I tried to run the code from here: https://keras.io/examples/generative/cyclegan/
but when running model.fit(..) I get the following error:
ValueError: Model <__main__.CycleGan object at 0x7fec3de767c0> cannot be saved either because the
input shape is not available or because the forward pass of the model is not defined.To define a
forward pass, please override `Model.call()`. To specify an input shape, either call
`build(input_shape)` directly, or call the model on actual data using `Model()`, `Model.fit()`, or
`Model.predict()`. If you have a custom training step, please make sure to invoke the forward pass
in train step through `Model.__call__`, i.e. `model(inputs)`, as opposed to `model.call()`.
Even if I run it through the linked colab file from the author. Did anyone else already faced this issue and knows how to fix it ?
I also tried to predefine the inputsize by model.build((batch_size,256,256,3)), but still get the same error.
If i comment the callbacks, it works. I think the problem is in the model_checkpoint_callback. Without it, the code works, but then I don't save the model.
Thanks a lot in advance for every answer!

The solution is to add this line:
# Create cycle gan model
cycle_gan_model = CycleGan(
generator_G=gen_G, generator_F=gen_F, discriminator_X=disc_X, discriminator_Y=disc_Y
)
# add following line:
cycle_gan_model.compute_output_shape(input_shape=(None, 256, 256, 3))

Related

Reducing size of training dataset in tensorflow 2 tutorial (Transformer model for language understanding) with '.take(n)' method does not work

I am a Tensorflow-newbie, therefore bear with me if my question is too basic or stupid ;)
I tried to reduce the size of the training dataset in the "Transformer model for language understanding"-tutorial of the Tensorflow website (https://www.tensorflow.org/tutorials/text/transformer). My intention was to make my test runs faster, when playing around with the code.
I thought I could use the dataset.take(n) method to shorten the training datasets. I added two lines right after the original dataset is read from file:
...
examples, metadata = tfds.load('ted_hrlr_translate/pt_to_en', with_info=True, as_supervised=True)
train_examples, val_examples = examples['train'], examples['validation']
# lines added to reduce dataset size
train_examples = train_examples.take(1000)
val_examples = val_examples.take(1000)
...
The resulting datasets (i.e., train_examples, val_examples) seem to have the intended size, and they seem to be working, e.g., with the tokenizer, which comes next in the turorial.
However, I get tons of error messages and warnings when I execute the code, more specifically when it enters training (i.e., train_step(inp, tar)). The error messages and warnings are too long to copy here, but perhaps an important part of it is this:
...
/home/kst/python/tf/tf_env/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1105 set_shape
raise ValueError(
ValueError: Tensor's shape (8216, 128) is not compatible with supplied shape (4870, 128)
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1
...
Some of the tensors in the training part seem to have an inappropriate size or shape.
Is there a good reason why .take(n) is not a good method to shorten datasets in Tensorflow?
Is there a better way to do it?
Thanks!
:)

I found the problem! It was not the .take() method, which caused the error message. It was the checkpoint manager:
checkpoint_path = "./checkpoints/train"
ckpt = tf.train.Checkpoint(transformer=transformer,
optimizer=optimizer)
ckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep=5)
# if a checkpoint exists, restore the latest checkpoint.
if ckpt_manager.latest_checkpoint:
ckpt.restore(ckpt_manager.latest_checkpoint)
print ('Latest checkpoint restored!!')
I seem to have had old checkpoints on my disk, which were created from a run with a larger dataset. The checkpoint manager automatically recovered the old checkpoints and wanted to continue from there. Of course, the size and shape of the old tensors did not match the new ones (e.g., the vocabulary sizes were different). This created the error message.
When I deleted the old checkpoints (i.d., the .ipynb_checkpoints directory) it all worked smoothly!
:-)

Unable to restore a layer of class TextVectorization - Text Classification

System information
Google Colab
When I run the example provided by official tensorflow basic text classification, everything runs fine until the model save, but when I load the model it gives me this error.
RuntimeError: Unable to restore a layer of class TextVectorization. Layers of class TextVectorization require that the class be provided to the model loading code, either by registering the class using #keras.utils.register_keras_serializable on the class def and including that file in your program, or by passing the class in a keras.utils.CustomObjectScope that wraps this load call.
Expected Behavior: Model should be loaded successfully and process the raw input
https://colab.research.google.com/gist/amahendrakar/8b65a688dc87ce9ca07ffb0ce50b84c7/44199.ipynb#scrollTo=fEjmSrKIqiiM
Example Link: https://tensorflow.google.cn/tutorials/keras/text_classification

I also ran into this error message (RuntimeError: Unable to restore a layer of class TextVectorization. [...]) when I implemented (and customized) the code from the "Basic Text Classification" tutorial.
Instead of running the code in a notebook, I have two scripts, one for building, training and saving the model and the other one for loading it and making predictions. (Thus, the error does not seem to be limited to Google Colab).
This is what I had to do (see https://github.com/tensorflow/tensorflow/issues/45231):
First, I added this line in the first script before the function definition and built, trained and saved the model again:
#tf.keras.utils.register_keras_serializable()
def custom_standardization(input_data):
[...]
# Save model as SavedModel
export_model.save(model_path, save_format='tf')
Secondly, I also had to add the same line and the whole function definition in the second script to make sure that it works if I restart(!) ipython (where I currently run the scripts) and only run the second script:
#tf.keras.utils.register_keras_serializable()
def custom_standardization(input_data):
lowercase = tf.strings.lower(input_data)
stripped_html = tf.strings.regex_replace(lowercase, '<br />', ' ')
return tf.strings.regex_replace(stripped_html,
'[%s]' % re.escape(string.punctuation),
'')
[...]
# Load model
reloaded_model = tf.keras.models.load_model(model_path)
# Make predictions
predictions = reloaded_model.predict(examples)
Note: If I run the second script without restarting ipython after running the first script, I get this error:
ValueError: Custom>custom_standardization has already been registered [...]
Alternatively, you can just use the default standardization method in the vectorizer layer when building the model:
vectorize_layer = TextVectorization(
standardize="lower_and_strip_punctuation",
max_tokens=max_features,
output_mode='int',
output_sequence_length=sequence_length)

I got something working as Hassan describes it, I think. Not sure it's the right way, but it seems to work for me...
I define, train, and archive the model in one notebook
I un-archive it, load it, and use it for predictions from another notebook.
See here: https://github.com/OlivierLD/oliv-ai/tree/master/JupyterNotebooks/tf-tutorials/sentiment-analysis

Issue with keras fit_generator epoch

I'm creating an LSTM Model for Text generation using Keras. As the dataset(around 25 novels,which has around 1.4 million words) I'm using can't be processed at once(An Memory issue with converting my outputs to_Categorical()) I created a custom generator function to read the Data in.
# Data generator for fit and evaluate
def generator(batch_size):
start = 0
end = batch_size
while True:
x = sequences[start:end,:-1]
#print(x)
y = sequences[start:end,-1]
y = to_categorical(y, num_classes=vocab_size)
#print(y)
yield x, y
if batch_size == len(lines):
break;
else:
start += batch_size
end += batch_size
when i excecute the model.fit() method, after 1 epoch is done training the following error is thrown.
UnknownError: [_Derived_] CUDNN_STATUS_BAD_PARAM
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1459): 'cudnnSetTensorNdDescriptor( tensor_desc.get(), data_type, sizeof(dims) / sizeof(dims[0]), dims, strides)'
[[{{node CudnnRNN}}]]
[[sequential/lstm/StatefulPartitionedCall]] [Op:__inference_train_function_25138]
Function call stack:
train_function -> train_function -> train_function
does anyone know how to solve this issue ? Thanks

From many sources in the Internet, this issue seems to occur while using LSTM Layer along with Masking Layer and while training on GPU.
Mentioned below can be the workarounds for this problem:
If you can compromise on speed, you can Train your Model on CPU rather than on GPU. It works without any error.
As per this comment, please check if your Input Sequences comprises of all Zeros, as the Masking Layer may mask all the Inputs
If possible, you can Disable the Eager Execution. As per this comment, it works without any error.
Instead of using a Masking Layer, you can try the alternatives mentioned in this link
a. Adding the argument, mask_zero = True to the Embedding Layer. or
b. Pass a mask argument manually when calling layers that support this argument
Last solution can be to remove Masking Layer, if that is possible.
If none of the above workaround solves your problem, Google Tensorflow Team is working to resolve this error. We may have to wait till that is fixed.
Hope this information helps. Happy Learning!

MXNet predict not working when model is cached

I have an MXNet MultilayerPerceptron inside a class MyModel.
I first load the trained weights from a file.
I am performing prediction with the MLP like this:
class MyModel:
...
def predict(self, X):
data_iterator = mx.io.NDArrayIter(data=X,
batch_size=self.model.data_shapes[0].shape[0], shuffle=False)
predictions_npa = self.model.predict(data_iterator ).asnumpy()
where X is a numpy array (1,777)
Now the first time i'm performing a MyModel.predict this works perfectly.
I then store the MyModel instance in a functools.LRUCache and trying to perform a second time the prediction with the exact same input.
And every time when doing that, my python process just stops doing anything, no logs, no actions, neither does it exit. I just know that when I try to inspect the result of self.model.predict(data_iterator ) in my PyCharm debugger I get a loading error.
So I'm a bit confused with what's happening there, if anyone had an idea it could be a great help!
Thanks

This maybe be because you have to recreate data_iterator. data_iterator is exausted once it has finished and .next() call will raise the error.

where do I find bidirectional_rnn in tensorflow 1.0.0?

I am using some code from here: https://github.com/monikkinom/ner-lstm with tensorflow. I think the code was written for an older version of tensorflow, I am using version 1.0.0. I used tf_upgrade.py to upgrade model.py in that github repos, but I am still getting the error:
output, _, _ = contrib_rnn.bidirectional_rnn(fw_cell, bw_cell,
AttributeError: 'module' object has no attribute 'bidirectional_rnn'
this is after I changed the bidirectional_rnn call to use contrib_rnn which is:
from tensorflow.contrib.rnn.python.ops import core_rnn as contrib_rnn
The old call was
output, _, _ = tf.nn.bidirectional_rnn(fw_cell, bw_cell,
tf.unpack(tf.transpose(self.input_data, perm=[1, 0, 2])),
dtype=tf.float32, sequence_length=self.length)
which also doesn't work.
I had to change the LSTMCell, DroputWrapper, etc. to rnn.LSTMCell, but they seem to work fine. It is the bidirectional_rnn that I can't figure out how to change.

In TensorFlow 1.0, you have the choice of two bidirectional RNN functions:
tf.nn.bidirectional_dynamic_rnn()
tf.contrib.rnn.static_bidirectional_rnn()

Maybe you can try to reimplement a bidirectional RNN by simply wrapping into a single class two monodirectional RNNs with the parameter "go_backwards=True" set on one of them. Then you can also have control over the type of merge done with the outputs. Maybe taking a look at the implementation in https://github.com/fchollet/keras/blob/master/keras/layers/wrappers.py (see the class Bidirectional) could get you started.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow's cycle gan model: error in fit method - python

The solution is to add this line: # Create cycle gan model cycle_gan_model = CycleGan( generator_G=gen_G, generator_F=gen_F, discriminator_X=disc_X, discriminator_Y=disc_Y ) # add following line: cycle_gan_model.compute_output_shape(input_shape=(None, 256, 256, 3))

Related

Reducing size of training dataset in tensorflow 2 tutorial (Transformer model for language understanding) with '.take(n)' method does not work

Unable to restore a layer of class TextVectorization - Text Classification

Issue with keras fit_generator epoch

MXNet predict not working when model is cached

where do I find bidirectional_rnn in tensorflow 1.0.0?

Categories

Resources