Trying to get Convolution of Single Image - python

I am trying to get the result of a single convolution over an image using tf.keras.backend.conv2d.
The specifications of the input are 227 pixels by 227 pixels, with a channel size of 3 (RGB image.)
The filter size I would like to use is 11x11 and a stride of 4. There is no zero padding included.
I am not married to the idea of using tf.keras.backend.conv2d. I am willing to change methods/packages, just as long as I get a convolved image with the specified requirements above.
Here is the chunk of code I'm trying to make work:
import tensorflow as tf
from tensorflow import keras
import cv2
image = cv2.imread('pic.jpg')
tf.keras.backend.conv2d(image,11,strides=4,data_format="channels_last",dilation_rate=(1))
I get this error message
InvalidArgumentError: cannot compute Conv2D as input #1(zero-based) was expected to be a double tensor but is a int32 tensor [Op:Conv2D] name: convolution/
If there is anything I can add to clarify, please let me know. I can post the entirety of the code, but most of it is irrelevant, at least in my opinion.
Thank you to whoever takes their time to help me!

You are using the wrong function. What you are using is the convolution op, which takes an input and a filter tensor and performs the convolution. As such, the second argument should be the filter tensor itself. You are trying to pass 11 as the filter tensor which obviously doesn't make sense. What I suspect you want to use is tf.keras.layers.Conv2D which takes care of creating the filter according to some specification and then wraps the convolution op as well. Try this:
conv_layer = tf.layers.Conv2D(1, 11, 4)
result = conv_layer(image)
This creates an 11x11 filter and a convolution op with stride 4; the second line then calls the op. I put 1 as the number of filters (first argument) since I don't know what exactly you are trying to do.

Related

Keras backend switch combined with tf.where not working as intended

I have a custom loss function where I want to change values from a one-hot based encoding to values in a certain range to calculate an IOU.
Part of this code is to look at where I have a one in a tensor that has zeros otherwise. For this I am using tf.where which returns me the location. I have a vector of shape [batch_size,S1,S2,12] where I only care for the last dimension, thats why I take [...,2] of tf.where.
Now it often happens that my prediction is all zeros because I have background events without any values in them and also my network will predict an all zero vector every now and then. This means tf.where will return an empty tensor.
Thats why I want to use K.switch to check if the tensor is empty, because if it is I would like to have zeros returned.
The problem is now that K.switch expects the shape of the then else options to have the same shape but I need my output to have shape [batch_size,S1,S2,1]. I have tried different things but I cant get this to work.
I need to get zeros of shape [batch_size,S1,S2,1] or I need where_box1 to have [batch_size,S1,S2,1] with floats.
The way its implemented now, K.switch returns an empty vector of zeros when where_box1_temp is empty, which is not what I want.
When I use tf.zeros([batch_size,S1,S2,1]) instead it will complain that the conditions are of different shape when where_box1_temp is empty....
where_box1_temp = tf.where(y_pred[...,C+1:C+13])[...,2]
where_box1 = K.switch(tf.equal(tf.size(where_box1_temp),0) ,
tf.zeros_like(where_box1_temp) , where_box1_temp)
So I found a workaround, maybe this is helpful for someone else:
where_box1_temp = tf.where(y_pred[...,C+1:C+13],[1,2,3,4,5,6,7,8,9,10,11,12],0)
where_box1 = tf.reshape(K.sum(where_box1_temp,axis=3),[batch_size,5,5])
This allows me to have a tensor of my desired shape where all background/zero prediction values are 0 without having to use k.switch and having trouble with any empty dimensions or something like that.

TensorFlow Federated: How can I write an Input Spec for a model with more than one input

I'm trying to make an image captioning model using the federated learning library provided by tensorflow, but I'm stuck at this error
Input 0 of layer dense is incompatible with the layer: : expected min_ndim=2, found ndim=1.
this is my input_spec:
input_spec=collections.OrderedDict(x=(tf.TensorSpec(shape=(2048,), dtype=tf.float32), tf.TensorSpec(shape=(34,), dtype=tf.int32)), y=tf.TensorSpec(shape=(None), dtype=tf.int32))
The model takes image features as the first input and a list of vocabulary as a second input, but I can't express this in the input_spec variable. I tried expressing it as a list of lists but it still didn't work. What can I try next?
Great question! It looks to me like this error is coming out of TensorFlow proper--indicating that you probably have the correct nested structure, but the leaves may be off. Your input spec looks like it "should work" from TFF's perspective, so it seems it is probably slightly mismatched with the data you have
The first thing I would try--if you have an example tf.data.Dataset which will be passed in to your client computation, you can simply read input_spec directly off this dataset as the element_spec attribute. This would look something like:
# ds = example dataset
input_spec = ds.element_spec
This is the easiest path. If you have something like "lists of lists of numpy arrays", there is still a way for you to pull this information off the data itself--the following code snippet should get you there:
# data = list of list of numpy arrays
input_spec = tf.nest.map_structure(lambda x: tf.TensorSpec(x.shape, x.dtype), data)
Finally, if you have a list of lists of tf.Tensors, TensorFlow provides a similar function:
# tensor_structure = list of lists of tensors
tf.nest.map_structure(tf.TensorSpec.from_tensor, tensor_structure)
In short, I would reocmmend not specifying input_spec by hand, but rather letting the data tell you what its input spec should be.

how to correctly use tf.function with a TensorFlow Dataset

I'm trying to use TF Datasets with a #tf.function to perform some preprocessing on a directory of images. Inside the tf function the image file is read as a RAW string tensor and I'm trying to take a slice from that tensor. The slice, the first 13 characters, represent info about .ppm images (header). I get an error: ValueError: Shape must be rank 1 but is rank 0 for 'Slice' (op: 'Slice') with input shapes: [], [1], [1]. Initially I was trying to directly slice the .numpy() attribute of the tensor (filepath input parameter to the tf function), but I think it is semantically wrong to do this inside a tf function. It also didn't work as the filepath input tensor does not have a numpy() attribute (I don't understand why??). Outside of the tf function, e.g. in a jupyter notebook cell, I can iterate over the dataset and get individual items which have a numpy attribute and do a slice and all subsequent processing on it just fine. I do realize there may be a gap in my understanding of how TF works (I am using TF 2.0), so I hope someone can clarify what I missed in my readings. The purpose of the tf function is convert the ppm images to png, so there is a side effect of this function, but I did not get that far to find out if this is possible to do.
Here's the code:
#tf.function
def ppm_to_png(filepath):
ppm_bytes = tf.io.read_file(filepath) #.numpy()
bytes_header = tf.slice(ppm_bytes, [0], [13])
# bytes_header = ppm_bytes[:13].eval() # this did not work either with similar error msg
.
.
.
import glob
files = glob.glob(os.path.join(data_dir, '00000/*.ppm'))
dataset = tf.data.Dataset.from_tensor_slices(files)
png_filepaths = dataset.map(ppm_to_png, num_parallel_calls=tf.data.experimental.AUTOTUNE)
To manipulate string values in TF, have a look at the tf.strings namespace.
In this case, you can use tf.strings.substr:
#tf.function
def ppm_to_png(filepath):
ppm_bytes = tf.io.read_file(filepath)
bytes_header = tf.strings.substr(ppm_bytes, 0, 13)
tf.print(bytes_header)
tf.slice only operates on the Tensor objects, and doesn't work on their elements. Here, ppm_bytes is a scalar Tensor containing a single element of type tf.string, and whose value is the entire string contents of the file. So when you call tf.slice, it only looks at the scalar bit, and is not smart enough to realize that you actually want to take a slice of that element instead.

How to update a sub-tensor inside a tensor in tensorflow?

I'm working with MNIST and I have a tensor of gradients with size [?,28,28,1] and I want to zero out a few of the [28,28,1] sub-tensors inside it, how should I accomplish this?
I know the indices (as a list) where I need to zero out the sub-tensors. I tried doing something like this (given below) but, scatter.update can only change variables not tensors. I also tried stacking up the required sub-tensors of zeroes and ones but couldn't build up the required result.
dy_dx, = tf.gradients(loss, x_adv)
zeroes = tf.zeros(dy_dx[0].get_shape(), tf.float32)
dy_dx = tf.scatter_update(dy_dx, indices, zeroes)
Thanks!
I'd suggest creating a TensorFlow constant with zeros at the locations you want to zero out and ones everywhere else. Then you could create an op that uses tf.multiply to do elementwise multiplication of the constant and dy_dx. Depending on the structure of your graph, you might need to feed the result to dy_dx in your next call to session.run; you can replace any Tensor with feed data, including variables and constants.
Incidentally, if you just want to apply dropout to the input layer you can use tf.layers.dropout

Multiple issues with axes while implementing a Seq2Seq with attention in CNTK

I'm trying to implement a Seq2Seq model with attention in CNTK, something very similar to CNTK Tutorial 204. However, several small differences lead to various issues and error messages, which I don't understand. There are many questions here, which are probably interconnected and all stem from some single thing I don't understand.
Note (in case it's important). My input data comes from MinibatchSourceFromData, created from NumPy arrays that fit in RAM, I don't store it in a CTF.
ins = C.sequence.input_variable(input_dim, name="in", sequence_axis=inAxis)
y = C.sequence.input_variable(label_dim, name="y", sequence_axis=outAxis)
Thus, the shapes are [#, *](input_dim) and [#, *](label_dim).
Question 1: When I run the CNTK 204 Tutorial and dump its graph into a .dot file using cntk.logging.plot, I see that its input shapes are [#](-2,). How is this possible?
Where did the sequence axis (*) disappear?
How can a dimension be negative?
Question 2: In the same tutorial, we have attention_axis = -3. I don't understand this. In my model there are 2 dynamic axis and 1 static, so "third to last" axis would be #, the batch axis. But attention definitely shouldn't be computed over the batch axis.
I hoped that looking at the actual axes in the tutorial code would help me understand this, but the [#](-2,) issue above made this even more confusing.
Setting attention_axis to -2 gives the following error:
RuntimeError: Times: The left operand 'Placeholder('stab_result', [#, outAxis], [128])'
rank (1) must be >= #axes (2) being reduced over.
during creation of the training-time model:
def train_model(m):
#C.Function
def model(ins: InputSequence[Tensor[input_dim]],
labels: OutputSequence[Tensor[label_dim]]):
past_labels = Delay(initial_state=C.Constant(seq_start_encoding))(labels)
return m(ins, past_labels) #<<<<<<<<<<<<<< HERE
return model
where stab_result is a Stabilizer right before the final Dense layer in the decoder. I can see in the dot-file that there are spurious trailing dimensions of size 1 that appear in the middle of the AttentionModel implementation.
Setting attention_axis to -1 gives the following error:
RuntimeError: Binary elementwise operation ElementTimes: Left operand 'Output('Block346442_Output_0', [#, outAxis], [64])'
shape '[64]' is not compatible with right operand
'Output('attention_weights', [#, outAxis], [200])' shape '[200]'.
where 64 is my attention_dim and 200 is my attention_span. As I understand, the elementwise * inside the attention model definitely shouldn't be conflating these two together, therefore -1 is definitely not the right axis here.
Question 3: Is my understanding above correct? What should be the right axis and why is it causing one of the two exceptions above?
Thanks for the explanations!
First, some good news: A couple of things have been fixed in the AttentionModel in the latest master (will be generally available with CNTK 2.2 in a few days):
You don't need to specify an attention_span or an attention_axis. If you don't specify them and leave them at their default values, the attention is computed over the whole sequence. In fact these arguments have been deprecated.
If you do the above the 204 notebook runs 2x faster, so the 204 notebook does not use these arguments anymore
A bug has been fixed in the AttentionModel and it now faithfully implements the Bahdanau et. al. paper.
Regarding your questions:
The dimension is not negative. We use certain negative numbers in various places to mean certain things: -1 is a dimension that will be inferred once based on the first minibatch, -2 is I think the shape of a placeholder, and -3 is a dimension that will be inferred with each minibatch (such as when you feed variable sized images to convolutions). I think if you print the graph after the first minibatch, you should see all shapes are concrete.
attention_axis is an implementation detail that should have been hidden. Basically attention_axis=-3 will create a shape of (1, 1, 200), attention_axis=-4 will create a shape of (1, 1, 1, 200) and so on. In general anything more than -3 is not guaranteed to work and anything less than -3 just adds more 1s without any clear benefit. The good news of course is that you can just ignore this argument in the latest master.
TL;DR: If you are in master (or starting with CNTK 2.2 in a few days) replace AttentionModel(attention_dim, attention_span=200, attention_axis=-3) with
AttentionModel(attention_dim). It is faster and does not contain confusing arguments. Starting from CNTK 2.2 the original API is deprecated.

Categories