Tensorflow: Concat different graphs for finetuning with external data - python

I want to implement this paper in tensorflow. It's approach is to create two separate CNNs at first, and then concatenate them for finetuning (as you can see in figure 1a)).
My current situation is: I have two pre-trained and saved models, each of them fed with a data queue input (so, no feed_dict and no Dataset API), and now i am off for finetuning. I want to restore them from disk and somehow concatenate them, so that i can define an optimizer which optimizes both networks.
This is my current approach:
# Build data input
aug_img, is_img, it_img = finetuning_read_training_images(trainset_size=num_steps,
batch_size=batch_size,
cropped=cropped,
base_folder=base_folder)
# Load the graphs
tf.reset_default_graph()
print("Loading ECNN graph...")
ecnn_graph = tf.train.import_meta_graph(os.path.join(ecnn_path, "ecnn.meta"), clear_devices=True)
trained_target = tf.get_default_graph().get_tensor_by_name("E-CNN/ecnn_output/ecnn_output_convolution/BiasAdd:0")
augmented_icnn_input = tf.concat([is_img, trained_target], axis=2)
icnn_graph = tf.train.import_meta_graph(os.path.join(icnn_path, "icnn.meta"), clear_devices=True, input_map={"aug_img": augmented_icnn_input, "it_img": it_img, "is_img": is_img})
The data input function reads 3 batches. aug_img is the source image, which has reflections, e.g. when photographed through a glass panel, augmented with its edge map as a 4th color channel. The ECNN graph should predict the reflection-free edge map. Tensorflow should augment the plain source image, which is stored in the is_img variable with the predicted reflection-free edge map, which should happen in the lines beginning with trained_target and augmented_icnn_input. The augmented image is then fed to the ICNN graph which should create a reflection-free image then, so it is given the it_img which is the target image. It is fed the not-augmented source image again, only for tensorboard visualization.
But now i am unable to further proceed. I cannot concat the both tensors for creating the augmented_icnn_input because i get a ValueError: Tensor("E-CNN/ecnn_output/ecnn_output_convolution/BiasAdd:0", shape=(?, 224, 224, 1), dtype=float32) must be from the same graph as Tensor("batch:1", shape=(?, 224, 224, 3), dtype=float32).
Also i seem not to understand the input_map in combination with the data input queue correctly, since although i have had definied the aug_imgand other variables in the ICNN, and see them in tensorboard, i do not see them in the variables collection and therefore cannot map them.
So i would like to know: Is this the correct approach to combine two subgraphs in a bigger graph? How can i solve the problem that i am unable to concat the two tensors in augmented_icnn_input? And why is my input_map not working?

Related

How to extract layer shape and type from ONNX / PyTorch?

I would like to 'translate' a PyTorch model to another framework (non-tf/keras).
I'm trying to take a pytorch model, and automate the translation to the other framework, which contains similar types of layers (i.e. conv2d, dense,...).
Is there a way from pytorch directly, or through onnx to retrieve a models layers, their types, shapes and connections ? (Weights are not important so far)
From discussion in comments on your question:
each node in onnx has a list of named inputs and a list of named outputs.
For the input list accessed with node.input you have for each input index either the graph input_name that feeds that input or the name of a previous output that feeds that input. There are also initializers which are onnx parameters.
# model is an onnx model
graph = model.graph
# graph inputs
for input_name in graph.input:
print(input_name)
# graph parameters
for init in graph.init:
print(init.name)
# graph outputs
for output_name in graph.output:
print(output_name)
# iterate over nodes
for node in graph.node:
# node inputs
for idx, node_input_name in enumerate(node.input):
print(idx, node_input_name)
# node outputs
for idx, node_output_name in enumerate(node.output):
print(idx, node_output_name)
Shape inference is talked about here and for python here
The gist for python is found here
Reproducing the gist from 3:
from onnx import shape_inference
inferred_model = shape_inference.infer_shapes(original_model)
and find the shape info in inferred_model.graph.value_info.
You can also use netron or from GitHub to have a visual representation of that information.

Post-training full integer quantization of Keras model

I'm trying to do post-training full 8-bit quantization of a Keras model to compile and deploy to EdgeTPU.
I have a trained Keras model saved as .h5 file, and am trying to go through the steps as specified here: https://coral.withgoogle.com/docs/edgetpu/models-intro/, for deployment to the Coral Dev Board.
I'm following these instructions for quantization: https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization_of_weights_and_activations)
I’m trying to use the following code:
import tensorflow as tf
num_calibration_steps = 100
def representative_dataset_gen():
for _ in range(num_calibration_steps):
# Get sample input data as a numpy array in a method of your choosing.
yield [X_train_quant_conv]
converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file('/tmp/classNN_simple.h5')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
converter.representative_dataset = representative_dataset_gen
tflite_full_integer_quant_model = converter.convert()
where X_train_quant_conv is a subset of my training data converted to np.array and of type np.float32
When running this piece of code, I get the following error:
ValueError: Cannot set tensor: Dimension mismatch
I’ve tried changing the function representative_dataset_gen() in different ways, but every time I get a new error. I’m not sure how this function should be. I’m also in doubt of what value num_calibration_steps should have.
Any suggestions or working examples are very appreciated.
This question is very similar to this answered question: Convert Keras model to quantized Tensorflow Lite model that can be used on Edge TPU
You might want to look at my demo script for quantization, on github.
It's just a guess since I can't see what X_train_quant_conv really is, but in my working demo, I yield one image at a time (random data created on the fly, in my case) in representative_dataset_gen(). The image is stored as batch of size 1 (e.g., tensor shape is (1, 56, 56, 32) for my 52x52x32 image). There are 32 channels, though there would typically just be 3, for a color image. I think representative_dataset_gen() has to yield a list containing a tensor (or more than one?) for which the first dimension is of length 1.
image_shape = (56, 56, 32)
def representative_dataset_gen():
num_calibration_images = 10
for i in range(num_calibration_images):
image = tf.random.normal([1] + list(image_shape))
yield [image]

TensorFlow: How to perform image categorisation on multiple images

Hoping that somebody can help out with a TensorFlow query. It's not a difficult one, I'm sure. I am just somewhat lacking in knowledge relating to TensorFlow and NumPy.
Without any prior experience of TensorFlow I have implemented Python code from a tutorial for doing image classification. This works. Once trained, it can tell the difference between a cat and a dog.
This is currently hard-wired for a single image. I would like to be able to classify multiple images (i.e. the contents of a folder), and do this efficiently. What I have done so far in an effort to achieve this is to simply add a loop around everything, so it runs all the code for each image. However, timing the operation shows that classification of each successive image takes longer than the previous one. Therefore there is some kind of incremental overhead. Some operation is taking more time with every loop. I cannot immediately see what this is.
There are two options to improve this. Either:
(1) Leave the loop largely as it is and prevent this slowdown, or
(2) (Preferable IMHO, if it is possible) Pass a list of images to TensorFlow for classification, and get back a list of results. This seems more efficient.
This is the code:
import tensorflow as tf
import numpy as np
import os,glob,cv2
import sys,argparse
import time
try:
inputdir = [redacted - insert input dir here]
for f in os.listdir(inputdir):
start_time = time.time()
filename = os.path.join(inputdir,f)
image_size=128
num_channels=3
images = []
image = cv2.imread(filename) # read image using OpenCV
# Resize image to desired size and preprocess exactly as done during training...
image = cv2.resize(image, (image_size, image_size),0,0, cv2.INTER_LINEAR)
images.append(image)
images = np.array(images, dtype=np.uint8)
images = images.astype('float32')
images = np.multiply(images, 1.0/255.0)
# The input to the network is of shape [None image_size image_size num_channels]. Hence we reshape.
x_batch = images.reshape(1, image_size,image_size,num_channels)
sess = tf.Session() # restore the saved model
saver = tf.train.import_meta_graph('dogs-cats-model.meta') # Step 1: Recreate the network graph. At this step only graph is created
saver.restore(sess, tf.train.latest_checkpoint('./')) # Step 2: Load the weights saved using the restore method
graph = tf.get_default_graph() # access the default graph which we have restored
# Now get hold of the op that we can be processed to get the output.
# In the original network y_pred is the tensor that is the prediction of the network
y_pred = graph.get_tensor_by_name("y_pred:0")
## Feed the images to the input placeholders...
x= graph.get_tensor_by_name("x:0")
y_true = graph.get_tensor_by_name("y_true:0")
y_test_images = np.zeros((1, 2))
# Create the feed_dict that is required to be fed to calculate y_pred...
feed_dict_testing = {x: x_batch, y_true: y_test_images}
result=sess.run(y_pred, feed_dict=feed_dict_testing)
# Note: result is a numpy.ndarray
print(f + '\t' + str(result) + ' ' + '%.2f' % (time.time()-start_time) + ' seconds')
# next image
except:
import traceback
tb = traceback.format_exc()
print(tb)
finally:
input() # keep window open until key is pressed
What I tried to do to modify the above was to create a list of filenames using...
images.append(image)
...and then taking the rest of the code out of the loop. However, this didn't work. It resulted in the following error:
ValueError: cannot reshape array of size 294912 into shape
(1,128,128,3)
At this line:
x_batch = images.reshape(1, image_size,image_size,num_channels)
Apparently this Reshape method doesn't work (as implemented, at least) on a list of images.
So my questions are:
What would causing the steady increase in image classification time that I have seen as images are iterated?
Can I perform classification on multiple images in one go, rather than one-by-one in a loop?
Thanks in advance.
Your issues:
1 a) The main reason why it is so slow is: You are re-creating the graph for every image.
1 b) The incremental overhead is coming from creating a new session every time without destroying the old session. The with syntax helps with that. e.g.:
with tf.Session(graph=tf.Graph()) as session:
# do something with the session
But that won't be a noticable issue after addressing a).
When thinking about the problem, one might realise which parts of your code depend on the image and which don't. The TensorFlow related part that is different per image is the call to session.run, feeding in the image. Everything else can be moved out of the loop.
2) You can also classify multiple images in one go. The first dimension of x_batch is the batch size. You are specifying one. But you may exhaust your memory resources trying to do that for a very large number of images.

how make correct predictions of jpeg image in cloud-ml

I want to predict a jpeg image in cloud-ml.
My training model is the inception model, and I would like to send the input to the first layer of the graph: 'DecodeJpeg/contents:0' (where I have to send a jpeg image). I have set this layer as possible input by adding in retrain.py:
inputs = {'image_bytes': 'DecodeJpeg/contents:0'}
tf.add_to_collection('inputs', json.dumps(inputs))
Then I save the results of the training in two files (export and export.meta) with:
saver.save(sess, os.path.join(output_directory,'export'))
and I create a model in cloud-ml using these files.
As suggested in some posts (here, here, and here from Google cloud official blog) I'm trying to make the prediction with
gcloud beta ml predict --json-instances=request.json --model=MODEL
where the instance is the jpeg image decoded in base64 format with:
python -c 'import base64, sys, json; img = base64.b64encode(open(sys.argv[1], "rb").read()); print json.dumps({"key":"0", "image_bytes": {"b64": img}})' image.jpg &> request.json
However the request return me:
error: 'Prediction failed: '
What is the problem of my procedure? Do you have any suggestion?
I particular from this post I assume that cloud-ml automatically convert the base64 image in jpeg format when it reads a request with image_bytes. Is it correct? Otherwise how can I do?
CloudML requires you to feed the graph with a batch of images.
I'm pretty sure this is the issue with re-using retrain.py. See that code's sess.run line; it is feeding a single image at a time. Compare with the batched jpeg placeholder in the flowers sample.
Note that three slightly different TF graphs need to be constructed: Training, Evaluation, and Prediction. See this recent blog post for details. The training and Prediction graphs directly consume embedding from preprocessing so they do not contain an Inception graph. For prediction, we need to take image bytes as input and use Inception to extract embeddings.
For online prediction, you need to export the prediction graph.You should also specify the outputs and a key for inputs.
To build the prediction graph (the code):
def build_prediction_graph(self):
"""Builds prediction graph and registers appropriate endpoints."""
tensors = self.build_graph(None, 1, GraphMod.PREDICT)
keys_placeholder = tf.placeholder(tf.string, shape=[None])
inputs = {
'key': keys_placeholder.name,
'image_bytes': tensors.input_jpeg.name
}
tf.add_to_collection('inputs', json.dumps(inputs))
# To extract the id, we need to add the identity function.
keys = tf.identity(keys_placeholder)
outputs = {
'key': keys.name,
'prediction': tensors.predictions[0].name,
'scores': tensors.predictions[1].name
}
tf.add_to_collection('outputs', json.dumps(outputs))
To export the preciction graph:
def export(self, last_checkpoint, output_dir):
# Build and save prediction meta graph and trained variable values.
with tf.Session(graph=tf.Graph()) as sess:
self.build_prediction_graph()
init_op = tf.global_variables_initializer()
sess.run(init_op)
self.restore_from_checkpoint(sess, self.inception_checkpoint_file,
last_checkpoint)
saver = tf.train.Saver()
saver.export_meta_graph(filename=os.path.join(output_dir, 'export.meta'))
saver.save(sess, os.path.join(output_dir, 'export'), write_meta_graph=False)
last_checkpoint must point to the latest checkpoint file from training:
self.model.export(tf.train.latest_checkpoint(self.train_path), self.model_path)
In your post, you indicated that your inputs collection has only "image_bytes" tensor alias. However, in the code where you are framing the request, you are including 2 inputs: One is "key" and the other is "image_bytes". So, my suggestion would be to remove "key" from the request or add "key" to the inputs collection.
Second issue is that the shape of DecodeJpeg/contents:0', is (). For Cloud ML, you need to have a shape like (None, ) so that you can feed that in.
There are some suggestions in other answers to your question here, on how you might be able to follow the public posts to modify your graph, but at hand I can tell these two issues.
Let us know if you encounter any further issues.

Same fc6 response for different images

I follow the instruction in filter visualization and classification example to get the fc6 (fully connected layer6) response to multiple different images in a folder from a pretrained model (bvlc reference model) but for all of the images I get the same vector.
Here is the code I used:
import caffe
caffe.set_mode_cpu()
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
mean=np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1),
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(256, 256))
filenames = next(os.walk(path))[2]
fc6Respose=[]
for i in range(0,len(filenames)):
input_image = caffe.io.load_image(path+filenames[i])
scores = net.predict([input_image])
feat = net.blobs['fc6'].data[4]
fc6Respose.append(feat)
PS: Is there any simple way to store this data in a file (like txt or csv) that can be used for later and can be read and opened without using Python?
You are accessing only a single element of the fc6 response (the fourth one). It might be the case that this element in the output is degenerate for the kind of inputs you tested it on. Try looking at the entire fc6 response.
Moreover, I'm not sure what model you are using, but are you certain this specific model expects its mean argument to be per-channel mean and not per-pixel?
BTW, you are using oversample for your input (the default option in caffe.Classifier.predict) this means the output you are getting is actually an average of 10 responses to slightly different input image (different cropping+mirroring). You might want to disable this option using
scores = net.predict([input_image], oversample=False)

Categories