With Tensorflow profiler, I am getting a lot of warning messages of the following form "Node gradients/resnet_model/IdentityN_9_grad/cond/Pad_1 incompatible shapes: Shapes (?, 11, 11, 64) and (128, 64, 11, 11) are not compatible during training." However, the training process does not crash. Can somebody explain the nature of those messages?
You have an undefined number of patches in one off your tf.placeholder (dim is 'None', and it is shown as '?' by the profiler).
Even if this is ok with tensorflow, this is not supported by the profiler.
Set a hard-codded value to this dim (so for you it is 128) and those warning will not occurred again.
Please note that this profiler seams not to be maintained and it may be disable in builds of TF version 2+.
For the complementary question in the comments "what profiler should I use for Tensorflow?", the answer is a bit complicated as you do not state what you want to explore in your TF script with the profiler, profiler that also have an included adviser.
Assuming you want to find a bottleneck in a TF model, checking compute time and memory on a device, then the easiest tool is to create a timeline json file and then read it in a Chrome browser.
Python script would be something like :
# build summary for logs (to be read with tensorboard):
# ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
log_dir = './logs_' + datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
if not os.path.exists(log_dir):
os.makedirs(log_dir)
tf.summary.scalar('loss', self.loss)
tf.summary.scalar('lr', self.lr)
tf.summary.scalar('psnr', self.psnr)
writer = tf.summary.FileWriter(log_dir, self.sess.graph)
merged = tf.summary.merge_all()
clip_all_weights = tf.get_collection("max_norm")
# enable full trace and metadata for tensorboard:
# ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
# request one TF iteration (this is a sample with a denoising CNN)
_, loss, summary = self.sess.run([self.train_op, self.loss, merged],
feed_dict={self.Y_: batch_clean, self.X: batch_noisy, self.lr: lr[epoch], self.psnr: self.psnr_tmp, self.is_training: True},
options=run_options,
run_metadata=run_metadata)
self.sess.run(clip_all_weights)
# add metadata and summary to log file:
# ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
writer.add_run_metadata(run_metadata, 'iter_%06d' % iter_num)
writer.add_summary(summary, iter_num)
# Create the Timeline object from metadata, and write it to a json file.
# Point Chrome browser to "chrome://tracing/" to load this json file.
# ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
tl = timeline.Timeline(run_metadata.step_stats)
ctf = tl.generate_chrome_trace_format(show_memory=True) #show_dataflow=True,
with open('json.timeline/timeline_%i.json' % iter_num, 'w') as f:
f.write(ctf)
Related
I'm new to Tensorflow and I'm trying to import a frozen graph (.pb file) that was trained in Python into a Java project using Deeplearning4j.
It seems that the model was saved successfully and it is working in Python, but when I try to import it with DL4J I'm getting the following issue and I don't know why:
Exception in thread "main" java.lang.IllegalStateException: Could not find class for TF Ops: TensorListFromTensor
at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:639)
at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:301)
at org.nd4j.imports.graphmapper.tf.TFGraphMapper.importGraph(TFGraphMapper.java:283)
at org.nd4j.imports.graphmapper.tf.TFGraphMapper.importGraph(TFGraphMapper.java:141)
at org.nd4j.imports.graphmapper.tf.TFGraphMapper.importGraph(TFGraphMapper.java:87)
at org.nd4j.imports.graphmapper.tf.TFGraphMapper.importGraph(TFGraphMapper.java:73)
at MLModel.loadModel(MLModel.java:30)
This is my model in Python:
def RNN():
inputs = tf.keras.layers.Input(name='inputs',shape=[max_len])
layer = tf.keras.layers.Embedding(max_words,50,input_length=max_len)(inputs)
layer = tf.keras.layers.LSTM(64)(layer)
layer = tf.keras.layers.Dense(256,name='FC1')(layer)
layer = tf.keras.layers.Activation('relu')(layer)
layer = tf.keras.layers.Dropout(0.5)(layer)
layer = tf.keras.layers.Dense(12,name='out_layer')(layer)
layer = tf.keras.layers.Activation('softmax')(layer)
model = tf.keras.models.Model(inputs=inputs,outputs=layer)
return model
Actually I based on this blog how to export the model: Save, Load and Inference From TensorFlow 2.x Frozen Graph
And this is how I'm trying to import the model in Java with DeepLearning4J:
public static void loadModel(String filepath) throws Exception{
File file = new File(filepath);
if (!file.exists()){
file = new File(filepath);
}
sd = TFGraphMapper.importGraph(file);
if (sd == null) {
throw new Exception("Error loading model : " + file);
}
}
I'm getting the exception in sd = TFGraphMapper.importGraph(file);
Does anyone know if I'm missing something?
That is the old model import. Please use the new one. The old one is not and will not be supported. You can find that here:
https://deeplearning4j.konduit.ai/samediff/explanation/model-import-framework
Both tensorflow and onnx work similarly. For tensorflow use:
//create the framework importer
TensorflowFrameworkImporter tensorflowFrameworkImporter = new TensorflowFrameworkImporter();
File pathToPbFile = ...;
SameDiff graph = tensorflowFrameworkImporter.runImport(pathToPbFile.getAbsolutePath(),Collections.emptyMap());
File an issue on the github repo: https://github.com/deeplearning4j/deeplearning4j/issues/new if something doesn't work for you.
Also note that if you use the tf keras api you can also import it using the keras hdf5 format (the old one).
For many graphs, you may also need to save the model and freeze it. You can use that here:
def convert_saved_model(saved_model_dir) -> GraphDef:
"""
Convert the saved model (expanded as a directory)
to a frozen graph def
:param saved_model_dir: the input model directory
:return: the loaded graph def with all parameters in the model
"""
saved_model = tf.saved_model.load(saved_model_dir)
graph_def = saved_model.signatures['serving_default']
frozen = convert_variables_to_constants_v2(graph_def)
return frozen.graph.as_graph_def()
We publish more code and utilities for that kind of thing here:
https://github.com/deeplearning4j/deeplearning4j/tree/master/contrib/omnihub/src/omnihub/frameworks
please help! I am new in using tensorboard, and been trying to use to vizualize the metrics of my model but I get some werid error
So I used a simple code of tensorboard from here : https://www.easy-tensorflow.com/tf-tutorials/basics/introduction-to-tensorboard , but still getting the same error, which is when I run the command line tensorboard --logdir="./graphs" to vizualize the board, I get the local # but it contains nothing. As when I go and check the content of the created log file, this is all i find :
enter image description here
import tensorflow as tf
tf.reset_default_graph() # To clear the defined variables and operations of the previous cell
# create graph
a = tf.constant(2)
b = tf.constant(3)
c = tf.add(a, b)
# creating the writer out of the session
# writer = tf.summary.FileWriter('./graphs', tf.get_default_graph())
# launch the graph in a session
with tf.Session() as sess:
# or creating the writer inside the session
writer = tf.summary.FileWriter('./graphs', sess.graph)
print(sess.run(c))
I just guess...
.# creating the writer out of the session
.# writer = tf.summary.FileWriter('./graphs', tf.get_default_graph()) >> Is it necessary to create it?
I am following this tutorial for image detection using Matterport repo.
I tried following this guide and edited the code to
How can I edit the following code to visualize the tensorboard ?
import tensorflow as tf
import datetime
%load_ext tensorboard
sess = tf.Session()
file_writer = tf.summary.FileWriter('/path/to/logs', sess.graph)
And then in the model area
# prepare config
config = KangarooConfig()
config.display()
# define the model
model = MaskRCNN(mode='training', model_dir='./', config=config)
model.keras_model.metrics_tensors = []
# Tensorflow board
logdir = os.path.join(
"logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
# load weights (mscoco) and exclude the output layers
model.load_weights('mask_rcnn_coco.h5',
by_name=True,
exclude=[
"mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox",
"mrcnn_mask"
])
# train weights (output layers or 'heads')
model.train(train_set,
test_set,
learning_rate=config.LEARNING_RATE,
epochs=5,
layers='heads')
I am not sure where to callbacks=[tensorboard_callback] ?
In your model.train, if you look closely in the source code documentation, there is parameter called custom_callbacks, which defaults to None.
It is there where you need to write your code, so to train with a custom callback, you will need to add this line of code:
model.train(train_set,
test_set,
learning_rate=config.LEARNING_RATE,
custom_callbacks = [tensorboard_callback],
epochs=5,
layers='heads')
You only have to open Anaconda Prompt and write tensorboard --logdir= yourlogdirectory, where yourlogdirectory is the directory containing the model checkpoint.
It should look something like this: logs\xxxxxx20200528T1755, where xxxx stands for the name you give to your configuration.
This command will generate a web address, copy it in our browser of preference.
I tried creating a model using tensorflow. When I tried executing it shows me
the other files are in this link------- github.com/llSourcell/tensorflow_chatbot
def train():
enc_train, dec_train=data_utils.prepare_custom_data(
gConfig['working_directory'])
train_set = read_data(enc_train,dec_train)
def seq2seq_f(encoder_inputs,decoder_inputs,do_decode):
return tf.nn.seq2seq.embedding_attention_seq2seq(
encoder_inputs,decoder_inputs, cell,
num_encoder_symbols=source_vocab_size,
num_decoder_symbols=target_vocab_size,
embedding_size=size,
output_projection=output_projection,
feed_previous=do_decode)
with tf.Session(config=config) as sess:
model = create_model(sess,False)
while True:
sess.run(model)
checkpoint_path = os.path.join(gConfig['working_directory'],'seq2seq.ckpt')
model.saver.save(sess, checkpoint_path, global_step=model.global_step)
other than this the other python files ive used are in the github link specified in the comments section below
this is the code defining create_model in the execute.py file
def create_model(session, forward_only):
"""Create model and initialize or load parameters"""
model = seq2seq_model.Seq2SeqModel( gConfig['enc_vocab_size'], gConfig['dec_vocab_size'], _buckets, gConfig['layer_size'], gConfig['num_layers'], gConfig['max_gradient_norm'], gConfig['batch_size'], gConfig['learning_rate'], gConfig['learning_rate_decay_factor'], forward_only=forward_only)
if 'pretrained_model' in gConfig:
model.saver.restore(session,gConfig['pretrained_model'])
return model
ckpt = tf.train.get_checkpoint_state(gConfig['working_directory'])
# the checkpoint filename has changed in recent versions of tensorflow
checkpoint_suffix = ""
if tf.__version__ > "0.12":
checkpoint_suffix = ".index"
if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path + checkpoint_suffix):
print("Reading model parameters from %s" % ckpt.model_checkpoint_path)
model.saver.restore(session, ckpt.model_checkpoint_path)
else:
print("Created model with fresh parameters.")
session.run(tf.initialize_all_variables())
return model
Okay, it seems like you have copied code but you did not structure it. If create_model() is defined in another file then you have to import it. Have you done that? (i.e. from file_with_methods import create_model). You should consider editing your post and adding more of your code, if you want us to help.
Alternative: You could also clone the github repository(that you shared in your comment) and just change whatever you want to change in the execution.py file. This way you can keep the "hierarchy" that the owner uses and you could add your own code where needed.
I used the retrain.py script to retrain the inception V3 model. From this script, I get several files: the output_graph.pb file., the labels.txt and the 3 checkpoint files (.meta,.data.index) using the writer_version= tf.train.SaverDef.V2. Following some ideas, I created my freezing script.
input_graph_name = "output_graph.pb"
output_graph_name = "frozen_graph.pb"
checkpoint_path = "C:\\Program Files (x86)\\Python 3.5.2\\tensorflow\\final\\output\\tmp\\map-0"
input_graph_path = os.path.join('C:\\Program Files (x86)\\Python 3.5.2\\tensorflow\\final\\output\\tmp', input_graph_name)
input_saver_def_path = ""
input_binary = True
output_node_names = "final_result"
restore_op_name = "save/restore_all"
filename_tensor_name = "save/Const:0"
freeze_graph.freeze_graph(input_graph_path,
input_saver_def_path,
input_binary,
checkpoint_path,
output_node_names,
restore_op_name,
filename_tensor_name,
output_graph_path,
clear_devices,
"")
However, I am getting the error:
TypeError: names_to_saveables must be a dict mapping string names to Tensors/Variables. Not a variable: Tensor("final_training_ops/biases/final_biases:0", shape=(2,), dtype=float32).
I know there is a node called final_training_ops/biases/final_biases:0 in the retrain.py but I only have interest on the final_result node which will be used to get the classification result. Some posts on the internet mention the .pbtxt + .ckpt files (using the writer_version= tf.train.SaverDef.V1) to freeze the model, but my computer freezes. I hope someone can help me to figure out what to do.