How to make predictions with tf.estimator.Estimator from checkpoint? - python

I just trained a CNN to recognise sunspots with tensorflow. My model is pretty much the same as this.
The problem is that I cannot find anywhere a clear explanation on how to make predictions with the checkpoint generated by the training phase.
Tried using the standard restore method:
saver = tf.train.import_meta_graph('./model/model.ckpt.meta')
saver.restore(sess,'./model/model.ckpt')
but then I cannot figure out how to run it.
Tried using tf.estimator.Estimator.predict() like this:
# Create the Estimator (should reload the last checkpoint but it doesn't)
sunspot_classifier = tf.estimator.Estimator(
model_fn=cnn_model_fn, model_dir="./model")
# Set up logging for predictions
# Log the values in the "Softmax" tensor with label "probabilities"
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(
tensors=tensors_to_log, every_n_iter=50)
# predict with the model and print results
pred_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": pred_data},
shuffle=False)
pred_results = sunspot_classifier.predict(input_fn=pred_input_fn)
print(pred_results)
but what it does is spitting out <generator object Estimator.predict at 0x10dda6bf8>.
While if I use the same code but with tf.estimator.Estimator.evaluate() it works like a charm (reloads the model, performs evaluation and sends it to TensorBoard).
I know there are many similar questions but I couldn't really find the way that worked for me.

sunspot_classifier.predict(input_fn=pred_input_fn) returns generator. So pred_results is generator object. To get value from it you need to iterate it by next(pred_results)
The solution is
print(next(pred_results))

Related

Hugging Face not able to reload all weights after training

I recently being using a RobertaLarge model, which I perform a down stream Training, using "Trainer" package.
All goes well, I see the loss going down, and compare manually some results with valid dataset.
Problem goes when I try to save the model and reload it afterwards.
I keep seeing the warning when trying to reload the model:
Some weights of the model checkpoint at Roberta_trained_1epoch were not used when initializing RobertaPreTrainedModel: ['module.roberta.encoder.layer.10.output.dense.bias', [........................................340_LAYERS_..................................]
'module.roberta.encoder.layer.6.attention.self.key.bias', 'module.roberta.encoder.layer.22.output.dense.weight', 'module.roberta.encoder.layer.3.attention.self.key.bias', 'module.roberta.encoder.layer.15.attention.self.value.bias', 'module.roberta.encoder.layer.15.attention.self.query.bias', 'module.roberta.encoder.layer.2.attention.self.value.bias']
I looked extensively for an answer to why this problem, and so far couldn't find a solution. Some claim this is just a warning and there's nothing wrong, however suspiciously I did some manual checks, and indeed the model seems... virgin.
I'm using the: Trainer.save_model('save_here') after training, and using the RobertaForTokenClassification.from_pretrained('save_here', local_files_only=True)model to reload it.
However the results show me that the model is not loading currently clearly.
training code:
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=ds_train,
eval_dataset=ds_valid,
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)
trainer.train()
trainer.evaluate()
trainer.save_model('save_here')
this results in evaluation loss of: 0.002
Reloading and re-evaluation:
model = RobertaForTokenClassification.from_pretrained('save_here', local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained('tokenizers_saved')
dl_valid = DataLoader(ds_valid, batch_size=Config.batch_size, shuffle=True)
with torch.no_grad():
for index, data in enumerate(dl_valid):
batch_input_ids = data['input_ids'].to(device, dtype=torch.long)
batch_att_mask = data['attention_mask'].to(device, dtype=torch.long)
batch_target = data['label_ids'].to(device, dtype=torch.long)
output = model(batch_input_ids, token_type_ids=None, attention_mask=batch_att_mask, labels=batch_target)
step_loss, eval_prediction = output['loss'], output['logits']
eval_prediction = np.argmax(eval_prediction.detach().to('cpu').numpy(), axis=2)
predictions.append(eval_prediction)
reals.append(batch_target)
eval_loss += step_loss
print(eval_loss)
This results in loss: 1.2 - 0.9 (randomly after loading)
I found out what was wrong.
Will share with others, given others may have the same issue.
My problem was that I wrapped my model into a DataParallel model = nn.DataParallel(model)
So it seems that Trainer can't save it properly and get it back the usual way.
As a work around:
model = trainer.model
model.module.save_pretrained('save_here')
....
# afterwards in another machine
....
model = RobertaForTokenClassification.from_pretrained('save_here')
Still think that this should be handled differently.

OutOfRangeError: tensorflow iterator not reinitializing between runs

I am fine-tuning an Inception model via tensorflow with the below setup, and am feeding batches tf.DatasetAPI. However, every time I attempt to train this model (before successfully retrieving any batches), I get an OutOfRangeError claiming that the iterator is exhausted:
Caught OutOfRangeError. Stopping Training. End of sequence
[[node IteratorGetNext (defined at <ipython-input-8-c768436e70d8>:13) = IteratorGetNext[output_shapes=[[?,224,224,3], [?,1]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]
with tf.Graph().as_default():
I created a function to feed in hard coded batches as the result of get_batch, and this runs and converges without any issues, leading me to believe that the graph and session code is working properly. I also tested the get_batch function to iterate in a session, and this causes no errors. The behavior I would expect is that restarting training (especially with reseting the notebook, etc. ) would produce a fresh iterator over the dataset.
Code to train model:
with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.INFO)
images, labels = get_batch(filenames=tf_train_record_path+train_file)
# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(inception.inception_v1_arg_scope()):
logits, ax = inception.inception_v1(images, num_classes=1, is_training=True)
# Specify the loss function:
tf.losses.mean_squared_error(labels,logits)
total_loss = tf.losses.get_total_loss()
tf.summary.scalar('losses/Total_Loss', total_loss)
# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = slim.learning.create_train_op(total_loss, optimizer)
# Run the training:
final_loss = slim.learning.train(
train_op,
logdir=train_dir,
init_fn=get_init_fn(),
number_of_steps=1)
Code to get batches using Dataset
def get_batch(filenames):
dataset = tf.data.TFRecordDataset(filenames=filenames)
dataset = dataset.map(parse)
dataset = dataset.batch(2)
iterator = dataset.make_one_shot_iterator()
data_X, data_y = iterator.get_next()
return data_X, data_y
This previously asked question resembles the issue I am experiencing, however, I am not using a batch_join call. I am not if this is an issue with slim.learning.train, restoring from a checkpoint, or scope. Any help would be appreciated!
Your input pipeline looks ok. The problem might be with damaged TFRecords file. You can try your code with random data, or use your images as numpy arrays with tf.data.Dataset.from_tensor_slices().
Also your parse function may cause problems. Try to print your image/label with sess.run.
And I'd advise using Estimator API as train_op. It is much more convenient and slim will be deprecated soon.

Tensorflow graph results appear random after restor

I trained a model to predict the next word in a sequence. I saved the model using tf.train.Saver(). However, when I go to restore the model and supply it the same seed value, the output changes each time I run the testing. For example, if I supply it with the words "happy birthday to", it will predict "you", but then , if I run it 10 seconds later, it will predict "rhyno". I have a feeling that this might be due to me randomly initializing the internal layers to random normal weights, however, wouldn't restoring the model restore the values after training and not reinitialize the layers? My restore code is below:
with tf.Session() as sess:
saved_model = tf.train.import_meta_graph(
'C:/Users/me/my_model.meta') # load graph from training
saved_model.restore(sess, tf.train.latest_checkpoint('./'))
imported_graph = tf.get_default_graph()
x = imported_graph.get_operation_by_name("ph_x").outputs[0]
prediction = imported_graph.get_tensor_by_name('prediction:0')
run_input = seed_values
print(np.array2string(run_input, separator=" "))
for _ in range(production_size):
run_input_oh = hlp.word_to_one_hot(run_input, hp_dict, 0)
pred = hlp.one_hot_to_word(sess.run(prediction, feed_dict={x: run_input_oh}), rev_dict)
print(sess.run(prediction, feed_dict={x: run_input_oh}))
You called default graph after restoring the saved weights. This will ignore restored weight.
Solution:
First call default graph, then restore the weights!
Try this!
with tf.Session() as sess:
saved_model = tf.train.import_meta_graph(
'C:/Users/me/my_model.meta') # load graph from training
imported_graph = tf.get_default_graph()
saved_model.restore(sess, tf.train.latest_checkpoint('./'))
...
#midhun pk's answer is mistaken, calling tf.get_default_graph() does not modify the graph, and calling it before or after saved_model.restore makes no difference.
Your code seems fine (calling import_meta_graph adds the nodes of the saved graph to the current graph, and calling restore restores the states of the variables), and it's difficult to debug without more information about your model. (eg what are run_input, seed_values, etc?) Can you provide a minimal reproducible example?
You should be able to verify if you variables are correctly restored by printing the value of the variable at save and restore time. Before saving, you can do print(sess.run(variable)) (or use tf.Print). After restoring, you can check the weights of the restored variables as follows: Supposing your variable's name is "XX", do:
var_value = imported_graph.get_tensor_by_name("XX:0")
print(sess.run(var_value))
I was able to find the issue, and it did not deal with the process of calling the saved weights.
When I first built the model for training, I created a dictionary from a text file by creating a set. In testing, I built the dictionary from the same text file, assuming that the order of elements would remain the same. Do not make this assumption. The order can change, hence the seemingly random results.

Log accuracy metric while training a tf.estimator

What's the simplest way to print accuracy metrics along with the loss when training a pre-canned estimator?
Most tutorials and documentations seem to address the issue of when you're creating a custom estimator -- which seems overkill if the intention is to use one of the available ones.
tf.contrib.learn had a few (now deprecated) Monitor hooks. TF now suggests using the hook API, but it appears that it doesn't actually come with anything that can utilize the labels and predictions to generate an accuracy number.
Have you tried tf.contrib.estimator.add_metrics(estimator, metric_fn) (doc)? It takes an initialized estimator (can be pre-canned) and adds to it the metrics defined by metric_fn.
Usage Example:
def custom_metric(labels, predictions):
# This function will be called by the Estimator, passing its predictions.
# Let's suppose you want to add the "mean" metric...
# Accessing the class predictions (careful, the key name may change from one canned Estimator to another)
predicted_classes = predictions["class_ids"]
# Defining the metric (value and update tensors):
custom_metric = tf.metrics.mean(labels, predicted_classes, name="custom_metric")
# Returning as a dict:
return {"custom_metric": custom_metric}
# Initializing your canned Estimator:
classifier = tf.estimator.DNNClassifier(feature_columns=columns_feat, hidden_units=[10, 10], n_classes=NUM_CLASSES)
# Adding your custom metrics:
classifier = tf.contrib.estimator.add_metrics(classifier, custom_metric)
# Training/Evaluating:
tf.logging.set_verbosity(tf.logging.INFO) # Just to have some logs to display for demonstration
train_spec = tf.estimator.TrainSpec(input_fn=lambda:your_train_dataset_function(),
max_steps=TRAIN_STEPS)
eval_spec=tf.estimator.EvalSpec(input_fn=lambda:your_test_dataset_function(),
steps=EVAL_STEPS,
start_delay_secs=EVAL_DELAY,
throttle_secs=EVAL_INTERVAL)
tf.estimator.train_and_evaluate(classifier, train_spec, eval_spec)
Logs:
...
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [20/200]
INFO:tensorflow:Evaluation [40/200]
...
INFO:tensorflow:Evaluation [200/200]
INFO:tensorflow:Finished evaluation at 2018-04-19-09:23:03
INFO:tensorflow:Saving dict for global step 1: accuracy = 0.5668, average_loss = 0.951766, custom_metric = 1.2442, global_step = 1, loss = 95.1766
...
As you can see, the custom_metric is returned along the default metrics and loss.
In addition to the answer of #Aldream you can also use the TensorBoard to see some graphics of the custom_metric. To do that, add it to a TensorFlow summary like this:
tf.summary.scalar('custom_metric', custom_metric)
The cool thing when you use the tf.estimator.Estimator is that you don't need to add the summaries to a FileWriter, since it's done automatically (merging and saving them every 100 steps by default).
In order to see the TensorBoard you need to open a new terminal and type:
tensorboard --logdir={$MODEL_DIR}
After that you will be able to see the graphics in your browser at localhost:6006.

keras dosen't load the model and weights when using checkpoint

I'm using keras to build a deep autoencoder. I used its checkpointer to load the model and the weights but the result is always None which I think it means that the checkpoint dosen't work correctly and is not saving weights.
Here is the code how I proceed:
checkpointer = ModelCheckpoint(filepath="weights.best.h5",
verbose=0,
save_best_only=True)
tensorboard = TensorBoard(log_dir='/tmp/autoencoder',
histogram_freq=0,
write_graph=True,
write_images=True)
input_enc = Input(shape=(input_size,))
hidden_1 = Dense(hidden_size1, activation='relu')(input_enc)
hidden_11 = Dense(hidden_size2, activation='relu')(hidden_1)
code = Dense(code_size, activation='relu')(hidden_11)
hidden_22 = Dense(hidden_size2, activation='relu')(code)
hidden_2 = Dense(hidden_size1, activation='relu')(hidden_22)
output_enc = Dense(input_size, activation='tanh')(hidden_2)
autoencoder_yes = Model(input_enc, output_enc)
autoencoder_yes.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
history_yes = autoencoder_yes.fit(df_noyau_norm_y, df_noyau_norm_y,
epochs=200,
batch_size=batch_size,
shuffle = True,
validation_data=(df_test_norm_y, df_test_norm_y),
verbose=1,
callbacks=[checkpointer, tensorboard]).history
autoencoder_yes.save_weights("weights.best.h5")
print(autoencoder_yes.load_weights("weights.best.h5"))
Can somebody help me find out a way to resolve the problem?
Thanks
No, your interpretation of load_weights returning None is not correct. Load weights is a procedure, it does not return anything, and if you assign the return value of a procedure to a variable, it will get the value of None.
So weight saving is probably working fine, its just your interpretation that is wrong.
you should use save_weights_only=True. Without this the whole model is saved not just the weights. To be able to load weights you must save weights like this:
checkpointer = ModelCheckpoint(filepath="weights.best.h5",
verbose=0, save_weights_only=True,
save_best_only=True)
This is expected behavior not an error. The autoencoder_yes.load_weights("weights.best.h5") doesn't actually return anything, so if you try to print the output of this function you will get None as output.
Expected behavior
In the code that you have provided, you have trained the model and saved the weights. So, the autoencoder_yes is a keras.Model object that has the fine-tuned weights.
In the same script if you load the saved weights once again, nothing is supposed to happen, the weights that you saved will get loaded again.
For clarity
Start with another fresh script, build the same model architecture and reload the weights from the h5 file and then do some predictions. In that case it will silently load the pre-trained weights and do the predictions according to that.

Categories