I am using Keras with Tensorflow backend and want to evaluate the PSNR between two images. I am able to evaluate on all three RGB channels, like this:
def psnr(hr, sr):
return tf.image.psnr(hr, sr, max_val=255)
with the psnr function from tensorflow (tf.image.psnr
But what would I do to only evaluate on the first channel? I assume I need to extract those values from the tensor some how. In python it is usually possible to do something like hr[:, 0, 0], but this obviously does not work here.
im1 = tf.image.convert_image_dtype(np.random.randn(64,64,3), tf.float32)
im2 = tf.image.convert_image_dtype(np.random.randn(64,64,3), tf.float32)
psnr = tf.image.psnr(tf.expand_dims(im1[:,:,0], 2),
tf.expand_dims(im2[:,:,0], 2), max_val=255)
with tf.Session() as sess:
print (sess.run(psnr))
Get the first channel using im1[:,:,0] and reshape it to h x w x 1 by adding the one channel using expand_dims
Related
In short, I am struggling in trying to convert an image per-pixel category mask in a tf.data.Dataset from an integer-class encoding to a one-hot encoding.
Consider the image segmentation tensorflow tutorial example here:
https://www.tensorflow.org/tutorials/images/segmentation.
The input is an image and the output is a per-pixel integer-labeled category mask. In their example, the mask has a category value at each pixel represented by an integer: {0, 1, or 2}.
The train and test variables are of type tf.data.Dataset and each sample is an (image,mask) tuple.
This form of mask/output is consistent with the sparse_categorical_crossentropy loss function in the tutorial. However, I would like to be able to use other loss functions that require a one-hot encoding instead.
I have been attempting to convert the datasets via the tf.keras.utils.to_categorical() function using a map() call, ie.:
def mask_to_categorical(image, mask):
mask = tf.keras.utils.to_categorical(mask,3)
return image, mask
train = train.map(mask_to_categorical)
However, this fails with an error such as:
{...}/tensorflow_core/python/keras/utils/np_utils.py:40 to_categorical
y = np.array(y, dtype='int')
TypeError: __array__() takes 1 positional argument but 2 were given
Note:
My searching thus far has pointed towards eager/non-eager issues as one possible cause. For what it's worth, I verified that I am running in eager mode via:
>>> print('tf.executing_eagerly() = ', tf.executing_eagerly())
tf.executing_eagerly() = True
Any suggestions? Thank you!
Try modifying your function for the one-hot encoding like this:
def mask_to_categorical(image, mask):
mask = tf.one_hot(tf.cast(mask, tf.int32), 3)
mask = tf.cast(mask, tf.float32)
return image, mask
In my Keras CNN, I add the Input layer like this:
model.add(Conv2D(32, (3, 3), input_shape=(img_width, img_height, nb_channel)))
with nb_channel = 3 for RGB input and = 1 for grayscale input and the flow_from_directory and ImageDataGenerator
However, I want to specify a set of color to channel to input to my CNN, for example, only green and red channels are permitted, how can I do so?
I'm using Keras with tensorflow Backend
Beside from the neat solution of #Minh-Tuan Nguyen, we can also do the slicing as follow
#custom filter
def filter_layer(x):
red_x = x[:,:,:,0]
blue_x = x[:,:,:,2]
green_x = x[:,:,:,1]
red_x = tf.expand_dims(red_x, axis=3)
blue_x = tf.expand_dims(blue_x, axis=3)
green_x = tf.expand_dims(green_x, axis=3)
output = tf.concat([red_x, blue_x], axis=3)
return output
#model
input = Input(shape=(img_height, img_width, img_channels))
at the concat step we can choose the slice we want.
You can slice the input tensor inside a custom Lambda layer. Suppose you want only red and green:
model.add(Lambda(lambda x: x[:,:,:,:2], input_shape=(w, h, channels)))
TensorFlow allows for similar slicing to NumPy, for Keras you need wrap it around a Lambda layer to incorporate into your model.
I think it would be easier to process the slicing a bit more "naively" here since as my knowledge, Keras hasn't support slicing the tensor using a list of indices like python and numpy. Below is the example of my code for this problem. Try to see if it fit your requirement.
indices = [0,2]
def filter_layer(input, indices=indices):
for i in range(len(indices)):
index = indices[i]
x_temp = Lambda(lambda x: x[:,:,:,index][...,None])(input)
if i==0:
x = x_temp
else:
x = Concatenate(-1)([x, x_temp])
return x
input = Input(shape=(img_height, img_width, img_channels))
x = Lambda(filter_layer)(input)
After training the cnn model, I want to visualize the weight or print out the weights, what can I do?
I cannot even print out the variables after training.
Thank you!
To visualize the weights, you can use a tf.image_summary() op to transform a convolutional filter (or a slice of a filter) into a summary proto, write them to a log using a tf.train.SummaryWriter, and visualize the log using TensorBoard.
Let's say you have the following (simplified) program:
filter = tf.Variable(tf.truncated_normal([8, 8, 3]))
images = tf.placeholder(tf.float32, shape=[None, 28, 28])
conv = tf.nn.conv2d(images, filter, strides=[1, 1, 1, 1], padding="SAME")
# More ops...
loss = ...
optimizer = tf.GradientDescentOptimizer(0.01)
train_op = optimizer.minimize(loss)
filter_summary = tf.image_summary(filter)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter('/tmp/logs', sess.graph_def)
for i in range(10000):
sess.run(train_op)
if i % 10 == 0:
# Log a summary every 10 steps.
summary_writer.add_summary(filter_summary, i)
After doing this, you can start TensorBoard to visualize the logs in /tmp/logs, and you will be able to see a visualization of the filter.
Note that this trick visualizes depth-3 filters as RGB images (to match the channels of the input image). If you have deeper filters, or they don't make sense to interpret as color channels, you can use the tf.split() op to split the filter on the depth dimension, and generate one image summary per depth.
Like #mrry said, you can use tf.image_summary. For example, for cifar10_train.py, you can put this code somewhere under def train(). Note how you access a var under scope 'conv1'
# Visualize conv1 features
with tf.variable_scope('conv1') as scope_conv:
weights = tf.get_variable('weights')
# scale weights to [0 255] and convert to uint8 (maybe change scaling?)
x_min = tf.reduce_min(weights)
x_max = tf.reduce_max(weights)
weights_0_to_1 = (weights - x_min) / (x_max - x_min)
weights_0_to_255_uint8 = tf.image.convert_image_dtype (weights_0_to_1, dtype=tf.uint8)
# to tf.image_summary format [batch_size, height, width, channels]
weights_transposed = tf.transpose (weights_0_to_255_uint8, [3, 0, 1, 2])
# this will display random 3 filters from the 64 in conv1
tf.image_summary('conv1/filters', weights_transposed, max_images=3)
If you want to visualize all your conv1 filters in one nice grid, you would have to organize them into a grid yourself. I did that today, so now I'd like to share a gist for visualizing conv1 as a grid
You can extract the values as numpy arrays the following way:
with tf.variable_scope('conv1', reuse=True) as scope_conv:
W_conv1 = tf.get_variable('weights', shape=[5, 5, 1, 32])
weights = W_conv1.eval()
with open("conv1.weights.npz", "w") as outfile:
np.save(outfile, weights)
Note that you have to adjust the scope ('conv1' in my case) and the variable name ('weights' in my case).
Then it boils down on visualizing numpy arrays. One example how to visualize numpy arrays is
#!/usr/bin/env python
"""Visualize numpy arrays."""
import numpy as np
import scipy.misc
arr = np.load('conv1.weights.npb')
# Get each 5x5 filter from the 5x5x1x32 array
for filter_ in range(arr.shape[3]):
# Get the 5x5x1 filter:
extracted_filter = arr[:, :, :, filter_]
# Get rid of the last dimension (hence get 5x5):
extracted_filter = np.squeeze(extracted_filter)
# display the filter (might be very small - you can resize the window)
scipy.misc.imshow(extracted_filter)
Using the tensorflow 2 API, There are several options:
Weights extracted using the get_weights() function.
weights_n = model.layers[n].get_weights()[0]
Bias extracted using the numpy() convert function.
bias_n = model.layers[n].bias.numpy()
I am trying to obtain a prediction with gcloud by passing a base64 encoded image to a retrained inception model, by using a similar approach as the one adopted by Davide Biraghi in this post.
When using 'DecodeJpeg/contents:0' as input I also get the same error when trying to get predictions, therefore I adopted a slightly different approach.
Following rhaertel80's suggestions in his answer to this post, I have created a graph that takes a jpeg image as input in 'B64Connector/input', preprocess it and feeds it to the inception model in 'ResizeBilinear:0'.
The prediction returns the values, although the wrong ones (I am trying to find a solution in another post) but at least it doesn't fail. The placeholder I use as input is
images_placeholder = tf.placeholder(dtype=tf.string, shape=(None,), name='B64Connector/input')
And I add it to the model inputs with
inputs = {"b64_bytes": 'B64Connector/input:0'}
tf.add_to_collection("inputs", json.dumps(inputs))
As Davide I am following the suggestions found in these posts: here, here and here and I am trying to get predictions with
gcloud beta ml predict --json-instances=request.json --model=MODEL
where the file request.json has been obtained with this code
jpgtxt = base64.b64encode(open(imagefile ,"rb").read())
with open( outputfile, 'w' ) as f :
f.write( json.dumps( {"b64_bytes": {"b64": jpgtxt}} ) )
I would like to know why the prediction fails when I use as input 'DecodeJpeg/contents:0' and it doesn't when I use this different approach, since they look almost identical to me: I use the same script to generate the instances (changing the input_key) and the same command line to request predictions
Is there a way to pass the instance fed to 'B64Connector/input:0' to 'DecodeJpeg/contents:0' in order to get the right predictions?
Here I describe more in detail my approach and how I use the images_placeholder.
I define a function that resizes the image:
def decode_and_resize(image_str_tensor):
"""Decodes jpeg string, resizes it and returns a uint8 tensor."""
image = tf.image.decode_jpeg(image_str_tensor, channels=MODEL_INPUT_DEPTH)
# Note resize expects a batch_size, but tf_map supresses that index,
# thus we have to expand then squeeze. Resize returns float32 in the
# range [0, uint8_max]
image = tf.expand_dims(image, 0)
image = tf.image.resize_bilinear(
image, [MODEL_INPUT_HEIGHT, MODEL_INPUT_WIDTH], align_corners=False)
image = tf.squeeze(image, squeeze_dims=[0])
image = tf.cast(image, dtype=tf.uint8)
return image
and one that generates the definition of the graph in which the resize takes place and where images_placeholder is defined and used
def create_b64_graph() :
with tf.Graph().as_default() as b64_graph:
images_placeholder = tf.placeholder(dtype=tf.string, shape=(None,),
name='B64Connector/input')
decoded_images = tf.map_fn(
decode_and_resize, images_placeholder, back_prop=False, dtype=tf.uint8)
# convert_image_dtype, also scales [0, uint8_max] -> [0, 1).
images = tf.image.convert_image_dtype(decoded_images, dtype=tf.float32)
# Finally, rescale to [-1,1] instead of [0, 1)
images = tf.sub(images, 0.5)
images = tf.mul(images, 2.0)
# NOTE: using identity to get a known name for the output tensor.
output = tf.identity(images, name='B64Connector/output')
b64_graph_def = b64_graph.as_graph_def()
return b64_graph_def
Moreover I am using the following code to merge the resizing graph with the inception graph. Can I use a similar approach to link images_placeholder directly to 'DecodeJpeg/contents:0'?
def concatenate_to_inception_graph( b64_graph_def ):
model_dir = INPUT_MODEL_PATH
model_filename = os.path.join(
model_dir, 'classify_image_graph_def.pb')
with tf.Session() as sess:
# Import the b64_graph and get its output tensor
resized_b64_tensor, = (tf.import_graph_def(b64_graph_def, name='',
return_elements=['B64Connector/output:0']))
with gfile.FastGFile(model_filename, 'rb') as f:
inception_graph_def = tf.GraphDef()
inception_graph_def.ParseFromString(f.read())
# Concatenate b64_graph and inception_graph
g_1 = tf.import_graph_def(inception_graph_def, name='inception',
input_map={'ResizeBilinear:0' : resized_b64_tensor} )
return sess.graph
I'd like to be able to read in batches of images. However, these batches must be constructed by a python script (they cannot be placed into a file ahead of time for various reasons). What's the most efficient way, in tensorflow to do the following:
(1) Provided: A python list of B variable-length strings that point to images, all have the same size. B is the batch size.
(2) For each string, load the image it corresponds to, and apply a random crop of 5% (the crop is random but the size of the crop is fixed)
(3) Concatenate the images together into a tensor of size B x H x W x 3
If this is not possible, does anyone have any benchmarks / data on the efficiency loss of loading and preprocessing the images in python then placing them into a queue? I assume the net will run considerably faster if image loading / preprocessing is done internally on tensorflow.
This is how I understand your problem:
you have some images
you have a function sample_batch() which returns a batch of filenames of size B
you want to read the images corresponding to these filenames and preprocess them
finally you output a batch of these examples
input = tf.placeholder(tf.string, name='Input')
queue = tf.FIFOQueue(capacity, tf.string, [()], name='Queue')
enqueue_op = queue.enqueue_many(input)
reader = tf.WholeFileReader()
filename, content = reader.read(queue)
image = tf.image.decode_jpeg(content, channels=3)
# Preprocessing
image = tf.random_crop(image, [H, W, 3])
image = tf.to_float(image)
batch_image = tf.train.batch([image], batch_size=B, name='Batch')
output = inference(batch_image)
Then in the session, you have to run the enqueue operation with the filenames from your sample_batch()function:
with tf.Session() as sess:
tf.train.start_queue_runners()
for i in range(NUM_STEPS):
batch_filenames = sample_batch()
sess.run(enqueue_op, feed_dict={input: batch_filenames})
sess.run(output)
If you have the images as a byte array you can use something similar to this in your graph:
jpegs = tf.placeholder(tf.string, shape=(None))
images = tf.map_fn(lambda jpeg : your_processing_fn(jpeg), jpegs,
dtype=tf.float32)
logits = your_inference_model(images,labels)
where your_processing_fn is a function which receives a jpeg tensor of bytes, decodes, resizes and crops it and returns a image of H x W x 3
You need the latest version of tensorflow as map_fn is not in 0.8 and below.