I follow the instruction in filter visualization and classification example to get the fc6 (fully connected layer6) response to multiple different images in a folder from a pretrained model (bvlc reference model) but for all of the images I get the same vector.
Here is the code I used:
import caffe
caffe.set_mode_cpu()
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
mean=np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1),
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(256, 256))
filenames = next(os.walk(path))[2]
fc6Respose=[]
for i in range(0,len(filenames)):
input_image = caffe.io.load_image(path+filenames[i])
scores = net.predict([input_image])
feat = net.blobs['fc6'].data[4]
fc6Respose.append(feat)
PS: Is there any simple way to store this data in a file (like txt or csv) that can be used for later and can be read and opened without using Python?
You are accessing only a single element of the fc6 response (the fourth one). It might be the case that this element in the output is degenerate for the kind of inputs you tested it on. Try looking at the entire fc6 response.
Moreover, I'm not sure what model you are using, but are you certain this specific model expects its mean argument to be per-channel mean and not per-pixel?
BTW, you are using oversample for your input (the default option in caffe.Classifier.predict) this means the output you are getting is actually an average of 10 responses to slightly different input image (different cropping+mirroring). You might want to disable this option using
scores = net.predict([input_image], oversample=False)
Related
I have a .pth file and there is no model information available. How can I know the number of final output classes in .pth file.
For visualization, I can use Netron to see the number of classes. But, how can I get the same output number in Python.
You can load the checkpoint and inspect the shape of the weights of the last layer. Depending if the last layer has two weights (kernel and bias), you will have to inspect both.
I provide you with an example on how to inspect the last weight.
import torch
checkpoint = torch.load('_.pt')
last_key = list(checkpoint)[-1]
print(checkpoint[last_key].size())
Here is how, I did it.
model = torch.load('path to the model')
last_key = list(model['model'].values())[-1]
key_length = list(map(model['model'].get, last_key))
print((key_length))
Hoping that somebody can help out with a TensorFlow query. It's not a difficult one, I'm sure. I am just somewhat lacking in knowledge relating to TensorFlow and NumPy.
Without any prior experience of TensorFlow I have implemented Python code from a tutorial for doing image classification. This works. Once trained, it can tell the difference between a cat and a dog.
This is currently hard-wired for a single image. I would like to be able to classify multiple images (i.e. the contents of a folder), and do this efficiently. What I have done so far in an effort to achieve this is to simply add a loop around everything, so it runs all the code for each image. However, timing the operation shows that classification of each successive image takes longer than the previous one. Therefore there is some kind of incremental overhead. Some operation is taking more time with every loop. I cannot immediately see what this is.
There are two options to improve this. Either:
(1) Leave the loop largely as it is and prevent this slowdown, or
(2) (Preferable IMHO, if it is possible) Pass a list of images to TensorFlow for classification, and get back a list of results. This seems more efficient.
This is the code:
import tensorflow as tf
import numpy as np
import os,glob,cv2
import sys,argparse
import time
try:
inputdir = [redacted - insert input dir here]
for f in os.listdir(inputdir):
start_time = time.time()
filename = os.path.join(inputdir,f)
image_size=128
num_channels=3
images = []
image = cv2.imread(filename) # read image using OpenCV
# Resize image to desired size and preprocess exactly as done during training...
image = cv2.resize(image, (image_size, image_size),0,0, cv2.INTER_LINEAR)
images.append(image)
images = np.array(images, dtype=np.uint8)
images = images.astype('float32')
images = np.multiply(images, 1.0/255.0)
# The input to the network is of shape [None image_size image_size num_channels]. Hence we reshape.
x_batch = images.reshape(1, image_size,image_size,num_channels)
sess = tf.Session() # restore the saved model
saver = tf.train.import_meta_graph('dogs-cats-model.meta') # Step 1: Recreate the network graph. At this step only graph is created
saver.restore(sess, tf.train.latest_checkpoint('./')) # Step 2: Load the weights saved using the restore method
graph = tf.get_default_graph() # access the default graph which we have restored
# Now get hold of the op that we can be processed to get the output.
# In the original network y_pred is the tensor that is the prediction of the network
y_pred = graph.get_tensor_by_name("y_pred:0")
## Feed the images to the input placeholders...
x= graph.get_tensor_by_name("x:0")
y_true = graph.get_tensor_by_name("y_true:0")
y_test_images = np.zeros((1, 2))
# Create the feed_dict that is required to be fed to calculate y_pred...
feed_dict_testing = {x: x_batch, y_true: y_test_images}
result=sess.run(y_pred, feed_dict=feed_dict_testing)
# Note: result is a numpy.ndarray
print(f + '\t' + str(result) + ' ' + '%.2f' % (time.time()-start_time) + ' seconds')
# next image
except:
import traceback
tb = traceback.format_exc()
print(tb)
finally:
input() # keep window open until key is pressed
What I tried to do to modify the above was to create a list of filenames using...
images.append(image)
...and then taking the rest of the code out of the loop. However, this didn't work. It resulted in the following error:
ValueError: cannot reshape array of size 294912 into shape
(1,128,128,3)
At this line:
x_batch = images.reshape(1, image_size,image_size,num_channels)
Apparently this Reshape method doesn't work (as implemented, at least) on a list of images.
So my questions are:
What would causing the steady increase in image classification time that I have seen as images are iterated?
Can I perform classification on multiple images in one go, rather than one-by-one in a loop?
Thanks in advance.
Your issues:
1 a) The main reason why it is so slow is: You are re-creating the graph for every image.
1 b) The incremental overhead is coming from creating a new session every time without destroying the old session. The with syntax helps with that. e.g.:
with tf.Session(graph=tf.Graph()) as session:
# do something with the session
But that won't be a noticable issue after addressing a).
When thinking about the problem, one might realise which parts of your code depend on the image and which don't. The TensorFlow related part that is different per image is the call to session.run, feeding in the image. Everything else can be moved out of the loop.
2) You can also classify multiple images in one go. The first dimension of x_batch is the batch size. You are specifying one. But you may exhaust your memory resources trying to do that for a very large number of images.
I want to implement this paper in tensorflow. It's approach is to create two separate CNNs at first, and then concatenate them for finetuning (as you can see in figure 1a)).
My current situation is: I have two pre-trained and saved models, each of them fed with a data queue input (so, no feed_dict and no Dataset API), and now i am off for finetuning. I want to restore them from disk and somehow concatenate them, so that i can define an optimizer which optimizes both networks.
This is my current approach:
# Build data input
aug_img, is_img, it_img = finetuning_read_training_images(trainset_size=num_steps,
batch_size=batch_size,
cropped=cropped,
base_folder=base_folder)
# Load the graphs
tf.reset_default_graph()
print("Loading ECNN graph...")
ecnn_graph = tf.train.import_meta_graph(os.path.join(ecnn_path, "ecnn.meta"), clear_devices=True)
trained_target = tf.get_default_graph().get_tensor_by_name("E-CNN/ecnn_output/ecnn_output_convolution/BiasAdd:0")
augmented_icnn_input = tf.concat([is_img, trained_target], axis=2)
icnn_graph = tf.train.import_meta_graph(os.path.join(icnn_path, "icnn.meta"), clear_devices=True, input_map={"aug_img": augmented_icnn_input, "it_img": it_img, "is_img": is_img})
The data input function reads 3 batches. aug_img is the source image, which has reflections, e.g. when photographed through a glass panel, augmented with its edge map as a 4th color channel. The ECNN graph should predict the reflection-free edge map. Tensorflow should augment the plain source image, which is stored in the is_img variable with the predicted reflection-free edge map, which should happen in the lines beginning with trained_target and augmented_icnn_input. The augmented image is then fed to the ICNN graph which should create a reflection-free image then, so it is given the it_img which is the target image. It is fed the not-augmented source image again, only for tensorboard visualization.
But now i am unable to further proceed. I cannot concat the both tensors for creating the augmented_icnn_input because i get a ValueError: Tensor("E-CNN/ecnn_output/ecnn_output_convolution/BiasAdd:0", shape=(?, 224, 224, 1), dtype=float32) must be from the same graph as Tensor("batch:1", shape=(?, 224, 224, 3), dtype=float32).
Also i seem not to understand the input_map in combination with the data input queue correctly, since although i have had definied the aug_imgand other variables in the ICNN, and see them in tensorboard, i do not see them in the variables collection and therefore cannot map them.
So i would like to know: Is this the correct approach to combine two subgraphs in a bigger graph? How can i solve the problem that i am unable to concat the two tensors in augmented_icnn_input? And why is my input_map not working?
I want to predict a jpeg image in cloud-ml.
My training model is the inception model, and I would like to send the input to the first layer of the graph: 'DecodeJpeg/contents:0' (where I have to send a jpeg image). I have set this layer as possible input by adding in retrain.py:
inputs = {'image_bytes': 'DecodeJpeg/contents:0'}
tf.add_to_collection('inputs', json.dumps(inputs))
Then I save the results of the training in two files (export and export.meta) with:
saver.save(sess, os.path.join(output_directory,'export'))
and I create a model in cloud-ml using these files.
As suggested in some posts (here, here, and here from Google cloud official blog) I'm trying to make the prediction with
gcloud beta ml predict --json-instances=request.json --model=MODEL
where the instance is the jpeg image decoded in base64 format with:
python -c 'import base64, sys, json; img = base64.b64encode(open(sys.argv[1], "rb").read()); print json.dumps({"key":"0", "image_bytes": {"b64": img}})' image.jpg &> request.json
However the request return me:
error: 'Prediction failed: '
What is the problem of my procedure? Do you have any suggestion?
I particular from this post I assume that cloud-ml automatically convert the base64 image in jpeg format when it reads a request with image_bytes. Is it correct? Otherwise how can I do?
CloudML requires you to feed the graph with a batch of images.
I'm pretty sure this is the issue with re-using retrain.py. See that code's sess.run line; it is feeding a single image at a time. Compare with the batched jpeg placeholder in the flowers sample.
Note that three slightly different TF graphs need to be constructed: Training, Evaluation, and Prediction. See this recent blog post for details. The training and Prediction graphs directly consume embedding from preprocessing so they do not contain an Inception graph. For prediction, we need to take image bytes as input and use Inception to extract embeddings.
For online prediction, you need to export the prediction graph.You should also specify the outputs and a key for inputs.
To build the prediction graph (the code):
def build_prediction_graph(self):
"""Builds prediction graph and registers appropriate endpoints."""
tensors = self.build_graph(None, 1, GraphMod.PREDICT)
keys_placeholder = tf.placeholder(tf.string, shape=[None])
inputs = {
'key': keys_placeholder.name,
'image_bytes': tensors.input_jpeg.name
}
tf.add_to_collection('inputs', json.dumps(inputs))
# To extract the id, we need to add the identity function.
keys = tf.identity(keys_placeholder)
outputs = {
'key': keys.name,
'prediction': tensors.predictions[0].name,
'scores': tensors.predictions[1].name
}
tf.add_to_collection('outputs', json.dumps(outputs))
To export the preciction graph:
def export(self, last_checkpoint, output_dir):
# Build and save prediction meta graph and trained variable values.
with tf.Session(graph=tf.Graph()) as sess:
self.build_prediction_graph()
init_op = tf.global_variables_initializer()
sess.run(init_op)
self.restore_from_checkpoint(sess, self.inception_checkpoint_file,
last_checkpoint)
saver = tf.train.Saver()
saver.export_meta_graph(filename=os.path.join(output_dir, 'export.meta'))
saver.save(sess, os.path.join(output_dir, 'export'), write_meta_graph=False)
last_checkpoint must point to the latest checkpoint file from training:
self.model.export(tf.train.latest_checkpoint(self.train_path), self.model_path)
In your post, you indicated that your inputs collection has only "image_bytes" tensor alias. However, in the code where you are framing the request, you are including 2 inputs: One is "key" and the other is "image_bytes". So, my suggestion would be to remove "key" from the request or add "key" to the inputs collection.
Second issue is that the shape of DecodeJpeg/contents:0', is (). For Cloud ML, you need to have a shape like (None, ) so that you can feed that in.
There are some suggestions in other answers to your question here, on how you might be able to follow the public posts to modify your graph, but at hand I can tell these two issues.
Let us know if you encounter any further issues.
I am using tensorflow's imageNet trained model to extract the last pooling layer's features as representation vectors for a new dataset of images.
The model as is predicts on a new image as follows:
python classify_image.py --image_file new_image.jpeg
I edited the main function so that I can take a folder of images and return the prediction on all images at once and write the feature vectors in a csv file. Here is how I did that:
def main(_):
maybe_download_and_extract()
#image = (FLAGS.image_file if FLAGS.image_file else
# os.path.join(FLAGS.model_dir, 'cropped_panda.jpg'))
#edit to take a directory of image files instead of a one file
if FLAGS.data_folder:
images_folder=FLAGS.data_folder
list_of_images = os.listdir(images_folder)
else:
raise ValueError("Please specify image folder")
with open("feature_data.csv", "wb") as f:
feature_writer = csv.writer(f, delimiter='|')
for image in list_of_images:
print(image)
current_features = run_inference_on_image(images_folder+"/"+image)
feature_writer.writerow([image]+current_features)
It worked just fine for around 21 images but then crashed with the following error:
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1912, in as_graph_def
raise ValueError("GraphDef cannot be larger than 2GB.")
ValueError: GraphDef cannot be larger than 2GB.
I thought by calling the method run_inference_on_image(images_folder+"/"+image) the previous image data would be overwritten to only consider the new image data, which doesn't seem to be the case. How to resolve this issue?
The problem here is that each call to run_inference_on_image() adds nodes to the same graph, which eventually exceeds the maximum size. There are at least two ways to fix this:
The easy but slow way is to use a different default graph for each call to run_inference_on_image():
for image in list_of_images:
# ...
with tf.Graph().as_default():
current_features = run_inference_on_image(images_folder+"/"+image)
# ...
The more involved but more efficient way is to modify run_inference_on_image() to run on multiple images. Relocate your for loop to surround this sess.run() call, and you will no longer have to reconstruct the entire model on each call, which should make processing each image much faster.
You can move the create_graph() to somewhere before this loop for image in list_of_images: (which loops over files).
What it does is performing inference multiple times on the same graph.
The simplest way is put create_graph() at the first of main function.
Then, it just create graph only
A good explanation of why such errors is mentioned here, I encountered the same error while using tf dataset api and came to the understanding that data when iterated over in the session, its getting appended on the existing graph. so what I did is used tf.reset_default_graph() before the dataset iterator to make sure that previous graph is cleared away.
Hope this helps for such a scenario.