To log hparams without using Keras, I'm doing the following as suggested in the tf code here:
with tf.summary.create_file_writer(model_dir).as_default():
hp_learning_rate = hp.HParam("learning_rate", hp.RealInterval(0.00001, 0.1))
hp_distance_margin = hp.HParam("distance_margin", hp.RealInterval(0.1, 1.0))
hparams_list = [
hp_learning_rate,
hp_distance_margin
]
metrics_to_monitor = [
hp.Metric("metrics_standalone/auc", group="validation"),
hp.Metric("loss", group="train", display_name="training loss"),
]
hp.hparams_config(hparams=hparams_list, metrics=metrics_to_monitor)
hparams = {
hp_learning_rate: params.learning_rate,
hp_distance_margin: params.distance_margin,
}
hp.hparams(hparams)
Note that params is a dictionary object here that I'll pass to the estimator.
Then I train the estimator as usual,
config = tf.estimator.RunConfig(model_dir=params.model_dir)
estimator = tf.estimator.Estimator(model_fn, params=params, config=config)
train_spec = tf.estimator.TrainSpec(...)
eval_spec = tf.estimator.EvalSpec(...)
tf.estimator.train_and_evaluate(estimator, train_spec=train_spec, eval_spec=eval_spec)
After training, when I launch tensorboard, I do have the hparams logged, but I do not see any metrics logged against them
I further confirmed that they show up in the scalars page with the same tag name for both train and validation i.e. . and ./eval, but the hparams page doesn't see those logged tensors.
How do I use hparams with estimators?
I'm using
tensorboard 2.1.0
tensorflow 2.1.0
tensorflow-estimator 2.1.0
tensorflow-metadata 0.15.2
on Python 3.7.5
Attempt 1:
After some googling, I saw some older tf code where they passed hparams to params argument of Estimator, so just to make sure if tf2 logs those hparams by itself when given, I checked the Estimator docs and it says:
The params argument contains hyperparameters. It is passed to the
model_fn, if the model_fn has a parameter named "params", and to the
input functions in the same manner. Estimator only passes params
along, it does not inspect it. The structure of params is therefore
entirely up to the developer.
So using hparams as params will not be useful.
Attempt 2:
I doubt that since estimators use tensorflow.python.summary instead of tf.summary which is the default in v2, tensors logged by v1 was probably not accessible and so, I also tried to use
with tensorflow.python.summary.FileWriter(model_dir).as_default()
However that failed with RuntimeError: tf.summary.FileWriter is not compatible with eager execution. Use tf.contrib.summary instead.
Update: I ran it with eager execution disabled. Now, even the hparam initial logging did not happen. There was no hparams tab in tensorboard as it failed with error
E0129 13:03:07.656290 21584 hparams_plugin.py:104] HParams error: Can't find an HParams-plugin experiment data in the log directory. Note that it takes some time to scan the log directory; if you just started Tensorboard it could be that we haven't finished scanning it yet. Consider trying again in a few seconds.
Is there a way to make tensorboard read already logged metric tensors and link them with hparams?
The culprit seems to be
# This doesn't seem to compatible with Estimator API
hp.hparams_config(hparams=hparams_list, metrics=metrics_to_monitor)
Simply calling hparams logs all metrics logged with tf.summary. Then in tensorboard, you can filter only the metrics you need and then compare trials.
with tf.summary.create_file_writer(train_folder).as_default():
# params is a dict which contains
# { 'learning_rate': 0.001, 'distance_margin': 0.5,...}
hp.hparams(hparams=params))
Related
I'm trying to use transformer's huggingface pretrained model bert-base-uncased, but I want to increace dropout. There isn't any mention to this in from_pretrained method, but colab ran the object instantiation below without any problem. I saw these dropout parameters in classtransformers.BertConfig documentation.
Am I using bert-base-uncased AND changing dropout in the correct way?
model = BertForSequenceClassification.from_pretrained(
pretrained_model_name_or_path='bert-base-uncased',
num_labels=2,
output_attentions = False,
output_hidden_states = False,
attention_probs_dropout_prob=0.5,
hidden_dropout_prob=0.5
)
As Elidor00 already said it, your assumption is correct. Similarly I would suggest using a separated Config because it is easier to export and less prone to cause errors. Additionally someone in the comments ask how to use it via from_pretrained:
from transformers import BertModel, AutoConfig
configuration = AutoConfig.from_pretrained('bert-base-uncased')
configuration.hidden_dropout_prob = 0.5
configuration.attention_probs_dropout_prob = 0.5
bert_model = BertModel.from_pretrained(pretrained_model_name_or_path = 'bert-base-uncased',
config = configuration)
Yes this is correct, but note that there are two dropout parameters and that you are using a specific Bert model, that is BertForSequenceClassification.
Also as suggested by the documentation you could first define the configuration and then the way in the following way:
from transformers import BertModel, BertConfig
# Initializing a BERT bert-base-uncased style configuration
configuration = BertConfig()
# Initializing a model from the bert-base-uncased style configuration
model = BertModel(configuration)
# Accessing the model configuration
configuration = model.config
I am trying to build a neural network in Python for solving PDEs, and, as such, I have had to write custom training steps. My training function looks like this:
...
tf.enable_eager_execution()
class PDENet:
...
def train_step():
input = self.input
with tf.GradientTape() as tape, tf.Session() as sess:
tape.watch(input)
output = self.model(input)
self.loss = self.pde_loss(output) # (network does not use training data)
grad = tape.gradient(self.loss, self.model.trainable_weights)
self.optimizer.apply_gradients([(grad, self.model)])
...
Due to my hardware, I have no choice but to use tensorflow==1.12.0 and keras==2.2.4.
When I run this code, I get "RuntimeError: Attempting to capture an EagerTensor without building a function". I have seen other posts about this, but all of the answers say to update tensorflow/keras, which I can't, use "tf.enable_eager_execution()", which I've already done, and "tf.disable_v2_behavior()", which is nonexistent on older versions of tensorflow. Is there anything else I can do to solve this problem? The error makes me think tensorflow wants me to add #tf.function, but that feature also doesn't seem to exist in tensorflow 1.
Is it possible to create a hub module from existing checkpoints without chaining the training code?
Yes, absolutely. You need a session with (1) a Module and (2) the proper values in its variables. It doesn't matter if those come from actual training or merely restoring a checkpoint. Given a Python library for model building that knows nothing about TensorFlow Hub, you can have a tool on the side for export to a Hub Module that looks like:
import tensorflow_hub as hub
import your_library as build_model_body
def module_fn():
inputs = tf.placeholder(...)
logits = build_model_body(inputs)
hub.add_signature(inputs=inputs, outputs=logits)
def main(_):
spec = hub.create_module_spec(module_fn)
# Supply a checkpoint trained on a model from the same Python code.
checkpoint_path = "..."
# Output will be written here:
export_path = "..."
with tf.Graph().as_default():
module = hub.Module(spec)
init_fn = tf.contrib.framework.assign_from_checkpoint_fn(
checkpoint_path, module.variable_map)
with tf.Session() as session:
init_fn(session)
module.export(export_path, session=session)
Fine points to note:
build_model_body() should transform inputs to outputs (say, pixels to feature vectors) as suitable for a Hub module, but not include data reading, or loss and optimizers. For transfer learning, these are best left to the consumer of the module. Some refactoring may be required.
Supplying the module.variable_map is essential, to translate from plain variable names as created by running build_model_body() by itself to the variable names created by instantiating the Module, live in scope module/state.
What's the simplest way to print accuracy metrics along with the loss when training a pre-canned estimator?
Most tutorials and documentations seem to address the issue of when you're creating a custom estimator -- which seems overkill if the intention is to use one of the available ones.
tf.contrib.learn had a few (now deprecated) Monitor hooks. TF now suggests using the hook API, but it appears that it doesn't actually come with anything that can utilize the labels and predictions to generate an accuracy number.
Have you tried tf.contrib.estimator.add_metrics(estimator, metric_fn) (doc)? It takes an initialized estimator (can be pre-canned) and adds to it the metrics defined by metric_fn.
Usage Example:
def custom_metric(labels, predictions):
# This function will be called by the Estimator, passing its predictions.
# Let's suppose you want to add the "mean" metric...
# Accessing the class predictions (careful, the key name may change from one canned Estimator to another)
predicted_classes = predictions["class_ids"]
# Defining the metric (value and update tensors):
custom_metric = tf.metrics.mean(labels, predicted_classes, name="custom_metric")
# Returning as a dict:
return {"custom_metric": custom_metric}
# Initializing your canned Estimator:
classifier = tf.estimator.DNNClassifier(feature_columns=columns_feat, hidden_units=[10, 10], n_classes=NUM_CLASSES)
# Adding your custom metrics:
classifier = tf.contrib.estimator.add_metrics(classifier, custom_metric)
# Training/Evaluating:
tf.logging.set_verbosity(tf.logging.INFO) # Just to have some logs to display for demonstration
train_spec = tf.estimator.TrainSpec(input_fn=lambda:your_train_dataset_function(),
max_steps=TRAIN_STEPS)
eval_spec=tf.estimator.EvalSpec(input_fn=lambda:your_test_dataset_function(),
steps=EVAL_STEPS,
start_delay_secs=EVAL_DELAY,
throttle_secs=EVAL_INTERVAL)
tf.estimator.train_and_evaluate(classifier, train_spec, eval_spec)
Logs:
...
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [20/200]
INFO:tensorflow:Evaluation [40/200]
...
INFO:tensorflow:Evaluation [200/200]
INFO:tensorflow:Finished evaluation at 2018-04-19-09:23:03
INFO:tensorflow:Saving dict for global step 1: accuracy = 0.5668, average_loss = 0.951766, custom_metric = 1.2442, global_step = 1, loss = 95.1766
...
As you can see, the custom_metric is returned along the default metrics and loss.
In addition to the answer of #Aldream you can also use the TensorBoard to see some graphics of the custom_metric. To do that, add it to a TensorFlow summary like this:
tf.summary.scalar('custom_metric', custom_metric)
The cool thing when you use the tf.estimator.Estimator is that you don't need to add the summaries to a FileWriter, since it's done automatically (merging and saving them every 100 steps by default).
In order to see the TensorBoard you need to open a new terminal and type:
tensorboard --logdir={$MODEL_DIR}
After that you will be able to see the graphics in your browser at localhost:6006.
I'm trying to use google cloud platform to deploy a model to support prediction.
I train the model (locally) with the following instruction
~/$ gcloud ml-engine local train --module-name trainer.task --package-path trainer
and everything works fine (...):
INFO:tensorflow:Restoring parameters from gs://my-bucket1/test2/model.ckpt-45000
INFO:tensorflow:Saving checkpoints for 45001 into gs://my-bucket1/test2/model.ckpt.
INFO:tensorflow:loss = 17471.6, step = 45001
[...]
Loss: 144278.046875
average_loss: 1453.68
global_step: 50000
loss: 144278.0
INFO:tensorflow:Restoring parameters from gs://my-bucket1/test2/model.ckpt-50000
Mean Square Error of Test Set = 593.1018482
But, when I run the following command to create a version,
gcloud ml-engine versions create Mo1 --model mod1 --origin gs://my-bucket1/test2/ --runtime-version 1.3
Then I get the following error.
ERROR: (gcloud.ml-engine.versions.create) FAILED_PRECONDITION: Field: version.deployment_uri
Error: SavedModel directory gs://my-bucket1/test2/ is expected to contain exactly one
of: [saved_model.pb, saved_model.pbtxt].- '#type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:- description: 'SavedModel directory gs://my-bucket1/test2/ is expected
to contain exactly one of: [saved_model.pb, saved_model.pbtxt].'
field: version.deployment_uri
Here is a screenshot of my bucket. I have a saved model with 'pbtxt' format
my-bucket-image
Finally, I add the piece of code where I save the model in the bucket.
regressor = tf.estimator.DNNRegressor(feature_columns=feature_columns,
hidden_units=[40, 30, 20],
model_dir="gs://my-bucket1/test2",
optimizer='RMSProp'
)
You'll notice that the file in your screenshot is graph.pbtxt whereas saved_model.pb{txt} is needed.
Note that just renaming the file generally will not be sufficient. The training process outputs checkpoints periodically in case restarts happen and recovery is needed. However, those checkpoints (and corresponding graphs) are the training graph. Training graphs tend to have things like file readers, input queues, dropout layers, etc. which are not appropriate for serving.
Instead, TensorFlow requires you to explicitly export a separate graph for serving. You can do this in one of two ways:
During training (typically, after training is complete)
As a separate process after training.
During/After Training
For this, I'll refer you to the Census sample.
First, You'll need a "Serving Input Function", such as
def serving_input_fn():
"""Build the serving inputs."""
inputs = {}
for feat in INPUT_COLUMNS:
inputs[feat.name] = tf.placeholder(shape=[None], dtype=feat.dtype)
features = {
key: tf.expand_dims(tensor, -1)
for key, tensor in inputs.iteritems()
}
return tf.contrib.learn.InputFnOps(features, None, inputs)
The you can simply call:
regressor.export_savedmodel("path/to/model", serving_input_fn)
Or, if you're using learn_runner/Experiment, you'll need to pass an ExportStrategy like the following to the constructor of Experiment:
export_strategies=[saved_model_export_utils.make_export_strategy(
serving_input_fn,
exports_to_keep=1,
default_output_alternative_key=None,
)]
After Training
Almost exactly the same steps as above, but just in a separate Python script you can run after training is over (in your case, this is beneficial because you won't have to retrain). The basic idea is to construct the Estimator with the same model_dir used in training, then to call export as above, something like:
def serving_input_fn():
"""Build the serving inputs."""
inputs = {}
for feat in INPUT_COLUMNS:
inputs[feat.name] = tf.placeholder(shape=[None], dtype=feat.dtype)
features = {
key: tf.expand_dims(tensor, -1)
for key, tensor in inputs.iteritems()
}
return tf.contrib.learn.InputFnOps(features, None, inputs)
regressor = tf.contrib.learn.DNNRegressor(
feature_columns=feature_columns,
hidden_units=[40, 30, 20],
model_dir="gs://my-bucket1/test2",
optimizer='RMSProp'
)
regressor.export_savedmodel("my_model", serving_input_fn)
EDIT 09/12/2017
One slight change is needed to your training code. You are using tf.estimator.DNNRegressor, but that was introduced in TensorFlow 1.3; CloudML Engine only officially supports TensorFlow 1.2, so you'll need to use tf.contrib.learn.DNNRegressor instead. They are very similar, but one notable difference is that you'll need to use the fit method instead of train.
I had the same error message here, in my case there was two problems:
The path to bucket with misspelling
Wrong saved_file.pbtxt (with the first error message I put another renamed .pbtxt file in the same bucket with my model classes and this make the problem persist after the path corrected)
The command worked after delete the wrong file and correct the path.
I hope this helps too.