cell_clip and proj_clip parameter in Tensorflow LSTMCell - python

I'm learning TF to train a language model for my project. I found in LSTMCell initializer, there are two parameter, cell_clip and proj_clip, that I don't understand. and I don't found any reference about the two parameter online either. Can anyone help me understand those two parameter?
Best

In the official implementation of the rnn_cell, tensorflow defines what proj_clip is :
proj_clip: (optional) A float value. If num_proj > 0 and proj_clip is
provided, then the projected values are clipped elementwise to within
[-proj_clip, proj_clip]

Related

TensorFlow Custom Estimator predict throwing value error

Note: this question has an accompanying, documented Colab notebook.
TensorFlow's documentation can, at times, leave a lot to be desired. Some of the older docs for lower level apis seem to have been expunged, and most newer documents point towards using higher level apis such as TensorFlow's subset of keras or estimators. This would not be so problematic if the higher level apis did not so often rely closely on their lower levels. Case in point, estimators (especially the input_fn when using TensorFlow Records).
Over the following Stack Overflow posts:
Tensorflow v1.10: store images as byte strings or per channel?
Tensorflow 1.10 TFRecordDataset - recovering TFRecords
Tensorflow v1.10+ why is an input serving receiver function needed when checkpoints are made without it?
TensorFlow 1.10+ custom estimator early stopping with train_and_evaluate
TensorFlow custom estimator stuck when calling evaluate after training
and with the gracious assistance of the TensorFlow / StackOverflow community, we have moved closer to doing what the TensorFlow "Creating Custom Estimators" guide has not, demonstrating how to make an estimator one might actually use in practice (rather than toy example) e.g. one which:
has a validation set for early stopping if performance worsen,
reads from TF Records because many datasets are larger than the TensorFlow recommend 1Gb for in memory, and
that saves its best version whilst training
While I still have many questions regarding this (from the best way to encode data into a TF Record, to what exactly the serving_input_fn expects), there is one question that stands out more prominently than the rest:
How to predict with the custom estimator we just made?
Under the documentation for predict, it states:
input_fn: A function that constructs the features. Prediction continues until input_fn raises an end-of-input exception (tf.errors.OutOfRangeError or StopIteration). See Premade Estimators for more information. The function should construct and return one of the following:
A tf.data.Dataset object: Outputs of Dataset object must have same constraints as below.
features: A tf.Tensor or a dictionary of string feature name to Tensor. features are consumed by model_fn. They should satisfy the expectation of model_fn from inputs.
A tuple, in which case the first item is extracted as features.
(perhaps) Most likely, if one is using estimator.predict, they are using data in memory such as a dense tensor (because a held out test set would likely go through evaluate).
So I, in the accompanying Colab, create a single dense example, wrap it up in a tf.data.Dataset, and call predict to get a ValueError.
I would greatly appreciate it if someone could explain to me how I can:
load my saved estimator
given a dense, in memory example, predict the output with the estimator
to_predict = random_onehot((1, SEQUENCE_LENGTH, SEQUENCE_CHANNELS))\
.astype(tf_type_string(I_DTYPE))
pred_features = {'input_tensors': to_predict}
pred_ds = tf.data.Dataset.from_tensor_slices(pred_features)
predicted = est.predict(lambda: pred_ds, yield_single_examples=True)
next(predicted)
ValueError: Tensor("IteratorV2:0", shape=(), dtype=resource) must be from the same graph as Tensor("TensorSliceDataset:0", shape=(), dtype=variant).
When you use the tf.data.Dataset module, it actually defines an input graph which is independant from the model graph. What happens here is that you first created a small graph by calling tf.data.Dataset.from_tensor_slices(), then the estimator API created a second graph by calling dataset.make_one_shot_iterator() automatically. These 2 graphs can't communicate so it throws an error.
To circumvent this, you should never create a dataset outside of estimator.train/evaluate/predict. This is why everything data related is wrapped inside input functions.
def predict_input_fn(data, batch_size=1):
dataset = tf.data.Dataset.from_tensor_slices(data)
return dataset.batch(batch_size).prefetch(None)
predicted = est.predict(lambda: predict_input_fn(pred_features), yield_single_examples=True)
next(predicted)
Now, the graph is not created outside of the predict call.
I also added dataset.batch() because the rest of your code expect batched data and it was throwing a shape error. Prefetch just speed things up.

Tensorflow LSTM model parameter learning inside parameter

I'm tryinig to train my LSTM model in tensorflow and my module has to calculate parameter inside parameter. And i want to train both parameters altogether.
More details are in the picture below.
I think that tensorflow LSTM module's input must be a perfect sequence and parameters like "tf.placeholder".
How can i do this in tensorflow? Or can you recommend another appropriate framework better than tensorflow in this task?
Sorry for my poor english.
First of all your usage of the word parameter is quite confusing. Normally parameters are referred as trainable parameters and therefore every variable which is trained by the optimizer. There are also so-called hyper-parameters, which have to be set per hand e.g. like the model topology.
Tensorflow work with tensors, which are representations of data which are used to build the workflow and are filled with data during run time via placeholder which is like an entry point for the data.
Also, if you have trouble to build your model in tensorflow, then there is also keras. Keras can run with tensorflow as its backend but model building is much easier. Also, keras is also available in the tensorflow API as tf.keras. In keras one or multiple LSTMs are simplified as a layer which can be added to your model.
If you like a more specific answer to your question, please provide code to describe your problem.

what does the optional argument "constants" do in the keras recurrent layers?

I'm learning to build a customized sequence-to-sequence model with keras, and have been reading some codes that other people wrote, for example here. I got confused in the call method regarding constants. There is the keras "Note on passing external constants to RNNs", however I'm having trouble to understand what the constants are doing to the model.
I did go through the attention model and the pointer network papers, but maybe I've missed something.
Any reference to the modeling details would be appreciated! Thanks in advance.
Okay just as a reference in case someone else stumbles across this question: I went through the code in the recurrent.py file, I think the get_constants is getting the dropout mask and the recurrent dropout mask, then concatenating it with the [h,c] states (the order of these four elements is required in the LSTM step method). After that it doesn't matter anymore to the original LSTM cell, but you can add your own 'constants' (in the sense that it won't be learned) to pass from one timestep to the next. All constants will be added to the returned [h,c] states implicitly. In Keon's example the fifth position of the returned state is the input sequence, and it can be referenced in every timestep by calling states[-1].

TensorFlow - How to minimize function of one variable?

I've been given a fully trained model by another researcher that has inputs as placeholders. Regarding it as a function f(x), I would like to find x to minimize my distance metric (loss function) dist(x, f(x)). This could be something like the euclidean distance between the two points.
I tried to use TensorFlow's built-in optimizer functions. The issue is that tf.train.AdamOptimizer(1e-4).minimize(loss, var_list[input_placeholder]) fails, complaining that input_placeholder isn't of a supported type. Thus, I cannot get gradients for my input.
How can I optimize a function in TensorFlow when the inputs have to be specified in this way? Unfortunately, these placeholders are not passed through a Variable first, and I have to treat that model as a black box.
Using the Keras functional API detailed in this question, I created a dense layer with no bias to sit right before the model I was given. Holding its input as a constant all 1's vector, I optimized the joined model using only the Variable in the dense layer, giving me the optimal vector as the output of that layer.
All TensorFlow Optimizer subclasses allow you to minimize while only modifying a particular set of Variables, which I got out of Keras fairly simply.

TensorFlow BasicLSTMCell vs LSTMFusedBlockCell

I am new in TensorFlow. I have managed to build a graph that uses LSTMs to train a basic model using a BaiscLSTMCell, based on the TensorFlow tutorial.
But I need to make it faster. I have seen a comparison here and, since I do not have an Nvidia GPU, the LSTMBlockFusedCell seems to be the best option. I had a look at the documentation and I noticed that the signatures for the __init__() and __call__() functions are different. Specifically, I am worried about the cell_clip parameter in __init()__ and the sequence_length in call. What is more, the inputs tensor is of shape [time_len, batch_size, input_size]; isn't that different from that of the basic cell ([batch_size, time_len, input_size])? I do not want to use peepholes, so I will leave that to False (default).
Could someone explain if there are any other differences (apart from an improvement in the performance) between the BasicLSTMCell and the LSTMBlockFusedCell and how to properly set the parameters mentioned above to achieve the same result as the original?
The documentation for LSTMBlockCell says it should be drop-in compatible with LSTMCell, so the same arguments should have the same meaning.
Whether the input tensor is batch-first or time-first is unrelated to the cell and related instead to the dynamic_rnn / static_rnn you're using.

Categories