Visualize TFLite graph and get intermediate values of a particular node? - python

I was wondering if there is a way to know the list of inputs and outputs for a particular node in tflite? I know that I can get input/outputs details, but this does not allow me to reconstruct the computation process that happens inside an Interpreter. So what I do is:
interpreter = tf.lite.Interpreter(model_path=model_path)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.get_tensor_details()
The last 3 commands basically give me dictionaries which don't seem to have the necessary information.
So I was wondering if there is way to know where each nodes outputs goes? Surely Interpreter knows this somehow. Can we? Thanks.

Note: this answer was written for Tensorflow 1.x and, while the concept and core idea remains the same in TensorFlow 2.x, the commands in this answer might be deprecated.
The mechanism of TF-Lite makes the whole process of inspecting the graph and getting the intermediate values of inner nodes a bit tricky. The get_tensor(...) method suggested by the other answer does not work.
How to visualize TF-Lite inference graph?
TensorFlow Lite models can be visualized using the visualize.py script in the TensorFlow Lite repository. You just need to:
Clone the TensorFlow repository
Run the visualize.py script with bazel:
bazel run //tensorflow/lite/tools:visualize \
model.tflite \
visualized_model.html
Does the nodes in my TF model have a equivalent one in TF-Lite?
NO! In fact, TF-Lite can modify your graph so that it become more optimal. Here are some words about it from the TF-Lite documentation:
A number of TensorFlow operations can be processed by TensorFlow Lite even though they have no direct equivalent. This is the case for operations that can be simply removed from the graph (tf.identity), replaced by tensors (tf.placeholder), or fused into more complex operations (tf.nn.bias_add). Even some supported operations may sometimes be removed through one of these processes.
Moreover, the TF-Lite API currently doesn't allow to get node correspondence; it's hard to interpret the inner format of TF-Lite. So, you can't get the intermediate outputs for any nodes you want, even without the one more issue below...
Can I get intermediate values of some TF-Lite nodes?
NO! Here, I will explain why get_tensor(...) wouldn't work in TF-Lite. Suppose in the inner representation, the graph contains of 3 tensors, together with some dense operations (nodes) in-between (you can think of tensor1 as input and tensor3 as output of your model). During inference of this particular graph, TF-Lite only needs 2 buffers, let's show how.
First, use tensor1 to compute tensor2 by applying dense operation. This only requires 2 buffers to store the values:
dense dense
[tensor1] -------> [tensor2] -------> [tensor3]
^^^^^^^ ^^^^^^^
bufferA bufferB
Second, use the value of tensor2 stored in bufferB to compute tensor3... but wait! We don't need bufferA anymore, so let's use it to store the value of tensor3:
dense dense
[tensor1] -------> [tensor2] -------> [tensor3]
^^^^^^^ ^^^^^^^
bufferB bufferA
Now is the tricky part. The "output value" of tensor1 will still point to bufferA, which now holds the values of tensor3. So if you call get_tensor(...) for the 1st tensor, you'll get incorrect values. The documentation of this method even states:
This function cannot be used to read intermediate results.
How to get around this?
Easy but limited way. You can specify the names of the nodes, output tensors of which you want to get the values of during conversion:
tflite_convert \
-- # other options of your model
--output_arrays="output_node,intermediate/node/n1,intermediate/node/n2"
Hard but flexible way. You can compile TF-Lite with Bazel (using this instruction). Then you can actually inject some logging code to Interpreter::Invoke() in the file tensorflow/lite/interpreter.cc. An ugly hack, but it works.

As #FalconUA has pointed out, we cannot directly get intermediate inputs and outputs from a TFlite model. But, we can get inputs and outputs of layers by modifying the model buffer. This repo shows how it is done. We need to modify flat buffer schema for this to work. The modified TFlite schema (tflite folder in the repo) is available in the repo.
For the completeness of the answer, below is the relevant code:
def buffer_change_output_tensor_to(model_buffer, new_tensor_i):
# from https://github.com/raymond-li/tflite_tensor_outputter
# Set subgraph 0's output(s) to new_tensor_i
# Reads model_buffer as a proper flatbuffer file and gets the offset programatically
# It might be much more efficient if Model.subgraphs[0].outputs[] was set to a list of all the tensor indices.
fb_model_root = tflite_model.Model.GetRootAsModel(model_buffer, 0)
output_tensor_index_offset = fb_model_root.Subgraphs(0).OutputsOffset(0) # Custom added function to return the file offset to this vector
# print("buffer_change_output_tensor_to. output_tensor_index_offset: ")
# print(output_tensor_index_offset)
# output_tensor_index_offset = 0x5ae07e0 # address offset specific to inception_v3.tflite
# output_tensor_index_offset = 0x16C5A5c # address offset specific to inception_v3_quant.tflite
# Flatbuffer scalars are stored in little-endian.
new_tensor_i_bytes = bytes([
new_tensor_i & 0x000000FF, \
(new_tensor_i & 0x0000FF00) >> 8, \
(new_tensor_i & 0x00FF0000) >> 16, \
(new_tensor_i & 0xFF000000) >> 24 \
])
# Replace the 4 bytes corresponding to the first output tensor index
return model_buffer[:output_tensor_index_offset] + new_tensor_i_bytes + model_buffer[output_tensor_index_offset + 4:]
def get_tensor(path_tflite, tensor_id):
with open(path_tflite, 'rb') as fp:
model_buffer = fp.read()
model_buffer = buffer_change_output_tensor_to(model_buffer, int(tensor_id))
interpreter = tf.lite.Interpreter(model_content=model_buffer)
interpreter.allocate_tensors()
tensor_details = interpreter._get_tensor_details(tensor_id)
tensor_name = tensor_details['name']
input_details = interpreter.get_input_details()
interpreter.set_tensor(input_details[0]['index'], input_tensor)
interpreter.invoke()
tensor = interpreter.get_tensor(tensor_id)
return tensor

Related

Getting a prediction from an ONNX model in python

I can't find anyone who explains to a layman how to load an onnx model into a python script, then use that model to make a prediction when fed an image. All I could find were these lines of code:
sess = rt.InferenceSession("onnx_model.onnx")
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
pred = sess.run([label_name], {input_name: X.astype(np.float32)})[0]
But I don't know what any of that means. And everywhere I look, everybody already seems to know what they mean, so nobody's explaining it. That would be one thing if I could just run this code, but I can't. It gives me this error:
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid rank for input: Input3 Got: 2 Expected: 4 Please fix either the inputs or the model.
So I need to actually know what those things mean so I can figure out how to fix the error. Will someone knowledgeable please explain?
Let's first start by going over the code you provided, to make everything clear.
sess = ort.InferenceSession("onnx_model.onnx")
This line loads the model into a session object. This means that the layers, functions and weights used in the model are made ready to perform inferences.
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
The two methods get_inputs and get_outputs each retrieve some meta information about the model, that being what inputs the model expects, and what outputs it can provide. Off of this meta information in these lines, only the first input & output is actually used, and off of these, only the name is being gotten, and saved into variables.
For the last line, let's tackle that part by part.
pred = sess.run(...)[0]
This performs a inference on the model, we'll go over the inputs to this method after this, but for now, the output is a list of different outputs. These outputs are each numpy arrays. In this case only the first output in this list is being used, and saved to the pred variable
([label_name], {input_name: X.astype(np.float32)})
These are the inputs to sess.run. The fist is a list of names of outputs that you want to be computed by the session. The second argument is a dict, where each input's name maps to numpy arrays. These arrays are are expected to be of the same dimension as the ones supplied during creation of the model. Similarly the types of these arrays should also match the types used during creation of the model.
The error you encountered seems to indicate that the supplied array doesn't have the expected dimensions. These intended amount of dimensions seems to be 4.
To gain clarity about what the exact shape and data type of the input array should be, there are visualization tools, like Netron

RNN with inconsistent (repeated) padding (using Pytorch's Pack_padded_sequence)

Following the example from PyTorch docs I am trying to solve a problem where the padding is inconsistent rather than at the end of the tensor for each batch (in other words, no pun intended, I have a left-censored and right-censored problem across my batches):
# Data structure example from docs
seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]])
# Data structure of my problem
inconsistent_seq = torch.tensor([[1,2,0], [0,3,0], [0,5,6]])
lens = ...?
packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False)
How can I solve the problem of masking these padded 0’s when running them through an LSTM using (preferably) PyTorch functionality?
I "solved" this by essentially reindexing my data and padding left-censored data with 0's (makes sense for my problem). I also injected and extra tensor to the input dimension to track this padding. I then masked the right-censored data using the pack_padded_sequence method from the PyTorch library. Found a good source here:
https://www.kdnuggets.com/2018/06/taming-lstms-variable-sized-mini-batches-pytorch.html

TensorFlow Federated: How can I write an Input Spec for a model with more than one input

I'm trying to make an image captioning model using the federated learning library provided by tensorflow, but I'm stuck at this error
Input 0 of layer dense is incompatible with the layer: : expected min_ndim=2, found ndim=1.
this is my input_spec:
input_spec=collections.OrderedDict(x=(tf.TensorSpec(shape=(2048,), dtype=tf.float32), tf.TensorSpec(shape=(34,), dtype=tf.int32)), y=tf.TensorSpec(shape=(None), dtype=tf.int32))
The model takes image features as the first input and a list of vocabulary as a second input, but I can't express this in the input_spec variable. I tried expressing it as a list of lists but it still didn't work. What can I try next?
Great question! It looks to me like this error is coming out of TensorFlow proper--indicating that you probably have the correct nested structure, but the leaves may be off. Your input spec looks like it "should work" from TFF's perspective, so it seems it is probably slightly mismatched with the data you have
The first thing I would try--if you have an example tf.data.Dataset which will be passed in to your client computation, you can simply read input_spec directly off this dataset as the element_spec attribute. This would look something like:
# ds = example dataset
input_spec = ds.element_spec
This is the easiest path. If you have something like "lists of lists of numpy arrays", there is still a way for you to pull this information off the data itself--the following code snippet should get you there:
# data = list of list of numpy arrays
input_spec = tf.nest.map_structure(lambda x: tf.TensorSpec(x.shape, x.dtype), data)
Finally, if you have a list of lists of tf.Tensors, TensorFlow provides a similar function:
# tensor_structure = list of lists of tensors
tf.nest.map_structure(tf.TensorSpec.from_tensor, tensor_structure)
In short, I would reocmmend not specifying input_spec by hand, but rather letting the data tell you what its input spec should be.

How do I allow a text input to a TensorFlow model?

I'm working on a custom text classification model in TensorFlow, and would now like to set it up with TensorFlow serving for production deployment. The model predicts on the basis of a text embedding that's computed via a separate model, and that model requires the raw text to be encoded as a vector.
I have this working in a somewhat disjointed way right now, where one service does all the text preprocessing and then computes the embedding, which is then sent to the text classifier as the embedded text vector. It would be nice if we could bundle this all into one TensorFlow serving model, especially the initial text preprocessing step.
And that's where I'm stuck. How do you construct a Tensor (or other TensorFlow primitive) that is a raw text input? And do you need to do anything special to earmark the lookup table for the token-vector component mapping so that it gets saved out as part of the model bundle?
For reference, here's a rough approximation of what I have now:
input = tf.placeholder(tf.float32, [None, 510], name='input')
# lots of steps omitted for brevity/clarity
outputs = tf.linalg.matmul(outputs, terminal_layer, transpose_b=True, name='output')
sess = tf.Session()
tf.saved_model.simple_save(sess,
'model.pb',
inputs={'input': input}, outputs={'output': outputs})
This turns out to be relatively straightforward, thanks to the tf.lookup.StaticVocabularyTable that's available as part of the TensorFlow standard library.
My model is using a bag of words approach, rather than preserving order, though that would be a pretty simple change to the code.
Assuming you have a list object that encodes your vocabulary (which I've called vocab) and a matrix of corresponding term/token embeddings you want to use (which I've called raw_term_embeddings, since I'm coercing that into a Tensor), the code will look something like this:
initalizer = tf.lookup.KeyValueTensorInitializer(vocab, np.arange(len(vocab)))
lut = tf.lookup.StaticVocabularyTable(initalizer, 1) # the one here is the out of vocab size
lut.initializer.run(session=sess) # pushes the LUT onto the session
input = tf.placeholder(tf.string, [None, None], name='input')
ones_at = lut.lookup(input)
encoded_text = tf.math.reduce_sum(tf.one_hot(ones_at, tf.dtypes.cast(lut.size(), np.int32)), axis=0, keepdims=True)
# I didn't build an embedding for the out of vocabulary token
term_embeddings = tf.convert_to_tensor(np.vstack([raw_term_embeddings]), dtype=tf.float32)
embedded_text = tf.linalg.matmul(encoded_text, term_embeddings)
# then use embedded_text for the remainder of the model
The one small trick is also making sure to pass legacy_init_op=tf.tables_initializer() to the save function to hint TensorFlow Serving to initialize the lookup table for the text encoding when the model is loaded.

Select which tensor to use in middle of TensorFlow graph

In Tensorflow, how would I go about selecting between a python list of Tensors in the middle of my graph as an input to the rest of the graph?
Basically, I have a python list of Tensors that are candidates to be used as inputs in the rest of the graph. I want to select from one of them without adding extra dependencies that require all of the Tensors in the list to be computed (I think that would happen if I used tf.cond). How can I select one of them? I can't do it at the python level because I choose the tensor based on a value computed from a placeholder. So for example:'
x = tf.placeholder(tf.int32, shape=(num_steps, None))
y = tf.placeholder(tf.int32, shape=(None,))
lengths = tf.placeholder(tf.int32, shape=(None,))
# Pretend there is a bunch of lines of code here
output_index = max_sequence_length = tf.reduce_max(lengths)
final_output = potential_outputs[output_index] # won't work, output_index is Tensor
# Pretend the rest of the model uses final_output
More info if you want it:
I am unrolling an RNN and I want to only unroll to the maximum length of the sequence. When this is less then the number of unrolling steps, there is a lot of wasted computation. Dynamic_rnn and static_rnn do not meet my needs, so I am trying to come up with my own custom method of unrolling the graph.
To index in tensorflow use tf.slice.
It should be noted that based on the code you provided, I don't think you are indexing the outputs correctly using tf.reduce_max function since this is providing the actual maximum value across a given axis which may not be an integer (but I'm not sure how your network works). You may be looking for tf.argmax that returns to index for the maximum value. The issue with this however is that tensorflow does not a have a gradient defined for tf.argmax so that function cannot be a learned part of your network.

Categories