Training with mixed-precision using the tensorflow estimator api

Training with mixed-precision using the tensorflow estimator api - python

Does anyone has experience with mixed-precision training using the tensorflow estimator api?
I tried casting my inputs to tf.float16 and the results of the network back to tf.float32. For scaling the loss I used tf.contrib.mixed_precision.LossScaleOptimizer.
The error messages I get are relatively uninformative: "Tried to convert 'x' to a tensor and failed. Error: None values not supported",

I found the issue: I used tf.get_variable to store the learning rate. This variable has no gradient. Normal optimizers do not care, but tf.contrib.mixed_precision.LossScaleOptimizer crashes. Therefore, make sure these variables are not added to tf.GraphKeys.TRAINABLE_VARIABLES.

Related

Using Cohen's kappa with tensorflow and keras

I am desperately trying to use cohen kappa's metric as either a loss function or an evaluation metric in my keras neural network. I have tried many different implementations of it online and none of them seem to be maintained. In particular due to the fact that tf.contrib no longer exists in tensorflow 2.0. Any help in pointing in me in the right direction of a working implementation will be much appreciated!
When using the tf add-on class found here https://www.tensorflow.org/addons/api_docs/python/tfa/metrics/cohens_kappa.
I keep getting the following error and have no idea how I would go about debugging this.
ValueError: Number of samples in y_true and y_pred are different

How to use keras model inside other model in TPU

I am trying to convert a keras model to tpu model in google colab, but this model has another model inside.
Take a look at the code:
https://colab.research.google.com/drive/1EmIrheKnrNYNNHPp0J7EBjw2WjsPXFVJ
This is a modified version of one of the examples in the google tpu documentation:
https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/fashion_mnist.ipynb
If the sub_model is converted and used directly it works, but if the sub model is inside another model it does not work. I need the sub model type of network because i am trying to train a GAN network that has 2 networks inside (gan=generator+discriminator) so if this test works probably it will work with the gan too.
I have tried several things:
Convert to tpu the model without converting the sub model, in that case when training starts an error is prompted related to the inputs of the sub model.
Convert both the model and sub model to tpu, in that case an error is prompted when converting the "parent" model, the exception only says at the end "layers".
Convert only the sub model to tpu, in that case no error is prompted but the training is not accelerated by the tpu and it is extremely slow like if no conversion to tpu was made at all.
Using fixed batch size or not, both have the same result, the model does not work.
Any ideas? Thanks a lot.

Divide into parts only use submodel at tpu first. Then put something simple instead of submodel and use the model in TPU. If this does not work , create something very simple which includes similar structure with models you are sure that are working and then step by step add things to converge your complex model which you want to use in TPU.
I am struggling with such things. What I did at the very beginning using MNIST is trained the model and get the coefficients outside rewrite relu dense dropout and NN matricies myself and run the model using numpy and then cupy and then pyopencl and then I replaced functions with my own raw cuda C and opencl functions so that getting deeper and simpler I can find what is wrong when something does not work. At last I write my genetic selective training algo and learned a lot.
And most important it gave me the opportunity to try some crazy ideas for training and modelling and manuplating and making sense of NN coffecients.
The problem in my opinion is TF - Keras etc are too high level. Optimizers - Solvers , there is too much unknown. Even neural networks are not under control. GAN is problematic while training it does not converge everytime takes days to train most of the time. Even if you train. You dont know any idea how it converges. Most of the tricks - techniques which protects you from vanishing gradient are not mathematically backed they are nevertheless works very amazingly. (?!?)
**Go simpler deeper and and complexity step by step. Follow a practicing on which you comprehend as much as you can ** It will cost some time and energy but you will benefit it tremendously in my opinion.

Shape error only when TPU training Keras model

First off, this is not my code. I just changed it to be able to train it on TPU. The original author is here. I am able to run it on the GPU accelerated runtime on a collaboratory notebook but it seems to break when I do TPU accelerated runtime.
Here is my notebook. It just give me an error that the activation function is not the right size.
ValueError: Error when checking target: expected activation_21 to have shape (1,) but got array with shape (205,)
I would appreciate any help I can get as I spent like 3 hours debugging.

Since you are one-hot encoding the labels and therefore they are not sparse, you need to use 'categorical_accuracy' as the metric:
model.compile(..., metrics=['categorical_accuracy'])
or more succinctly use 'accuracy' to let Keras infer the right metric based on the loss function used (which in this case would be 'categorical_accuracy' since you are using categorical_crossentropy as the loss function):
model.compile(..., metrics=['accuracy'])

TensorFlow Custom Estimator predict throwing value error

Note: this question has an accompanying, documented Colab notebook.
TensorFlow's documentation can, at times, leave a lot to be desired. Some of the older docs for lower level apis seem to have been expunged, and most newer documents point towards using higher level apis such as TensorFlow's subset of keras or estimators. This would not be so problematic if the higher level apis did not so often rely closely on their lower levels. Case in point, estimators (especially the input_fn when using TensorFlow Records).
Over the following Stack Overflow posts:
Tensorflow v1.10: store images as byte strings or per channel?
Tensorflow 1.10 TFRecordDataset - recovering TFRecords
Tensorflow v1.10+ why is an input serving receiver function needed when checkpoints are made without it?
TensorFlow 1.10+ custom estimator early stopping with train_and_evaluate
TensorFlow custom estimator stuck when calling evaluate after training
and with the gracious assistance of the TensorFlow / StackOverflow community, we have moved closer to doing what the TensorFlow "Creating Custom Estimators" guide has not, demonstrating how to make an estimator one might actually use in practice (rather than toy example) e.g. one which:
has a validation set for early stopping if performance worsen,
reads from TF Records because many datasets are larger than the TensorFlow recommend 1Gb for in memory, and
that saves its best version whilst training
While I still have many questions regarding this (from the best way to encode data into a TF Record, to what exactly the serving_input_fn expects), there is one question that stands out more prominently than the rest:
How to predict with the custom estimator we just made?
Under the documentation for predict, it states:
input_fn: A function that constructs the features. Prediction continues until input_fn raises an end-of-input exception (tf.errors.OutOfRangeError or StopIteration). See Premade Estimators for more information. The function should construct and return one of the following:
A tf.data.Dataset object: Outputs of Dataset object must have same constraints as below.
features: A tf.Tensor or a dictionary of string feature name to Tensor. features are consumed by model_fn. They should satisfy the expectation of model_fn from inputs.
A tuple, in which case the first item is extracted as features.
(perhaps) Most likely, if one is using estimator.predict, they are using data in memory such as a dense tensor (because a held out test set would likely go through evaluate).
So I, in the accompanying Colab, create a single dense example, wrap it up in a tf.data.Dataset, and call predict to get a ValueError.
I would greatly appreciate it if someone could explain to me how I can:
load my saved estimator
given a dense, in memory example, predict the output with the estimator

to_predict = random_onehot((1, SEQUENCE_LENGTH, SEQUENCE_CHANNELS))\
.astype(tf_type_string(I_DTYPE))
pred_features = {'input_tensors': to_predict}
pred_ds = tf.data.Dataset.from_tensor_slices(pred_features)
predicted = est.predict(lambda: pred_ds, yield_single_examples=True)
next(predicted)
ValueError: Tensor("IteratorV2:0", shape=(), dtype=resource) must be from the same graph as Tensor("TensorSliceDataset:0", shape=(), dtype=variant).
When you use the tf.data.Dataset module, it actually defines an input graph which is independant from the model graph. What happens here is that you first created a small graph by calling tf.data.Dataset.from_tensor_slices(), then the estimator API created a second graph by calling dataset.make_one_shot_iterator() automatically. These 2 graphs can't communicate so it throws an error.
To circumvent this, you should never create a dataset outside of estimator.train/evaluate/predict. This is why everything data related is wrapped inside input functions.
def predict_input_fn(data, batch_size=1):
dataset = tf.data.Dataset.from_tensor_slices(data)
return dataset.batch(batch_size).prefetch(None)
predicted = est.predict(lambda: predict_input_fn(pred_features), yield_single_examples=True)
next(predicted)
Now, the graph is not created outside of the predict call.
I also added dataset.batch() because the rest of your code expect batched data and it was throwing a shape error. Prefetch just speed things up.

Tensorflow LSTM model parameter learning inside parameter

I'm tryinig to train my LSTM model in tensorflow and my module has to calculate parameter inside parameter. And i want to train both parameters altogether.
More details are in the picture below.
I think that tensorflow LSTM module's input must be a perfect sequence and parameters like "tf.placeholder".
How can i do this in tensorflow? Or can you recommend another appropriate framework better than tensorflow in this task?
Sorry for my poor english.

First of all your usage of the word parameter is quite confusing. Normally parameters are referred as trainable parameters and therefore every variable which is trained by the optimizer. There are also so-called hyper-parameters, which have to be set per hand e.g. like the model topology.
Tensorflow work with tensors, which are representations of data which are used to build the workflow and are filled with data during run time via placeholder which is like an entry point for the data.
Also, if you have trouble to build your model in tensorflow, then there is also keras. Keras can run with tensorflow as its backend but model building is much easier. Also, keras is also available in the tensorflow API as tf.keras. In keras one or multiple LSTMs are simplified as a layer which can be added to your model.
If you like a more specific answer to your question, please provide code to describe your problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.