Proper use of tf.layers.MaxPooling - python

I'm building a model in Tensorflow using tf.layers objects. When I run the following code using tf.layers.MaxPooling2D my model does not reduce in size. I've only recently switched from using Keras to Tensorflow directly so I presume I'm misunderstanding the usage.
import tensorflow as tf
import numpy as np
features = tf.constant(np.random.random((20,128,128,3)), dtype=tf.float32)
y_true = tf.constant(np.random.random((20,1)), dtype=tf.float32)
print('features = %s' % features)
conv = tf.layers.Conv2D(32,(2,2),padding='same')(features)
print('conv = %s' % conv)
pool = tf.layers.MaxPooling2D((2,2),(1,1),padding='same')(conv)
print('pool = %s' % pool)
# and so on ...
I see this output:
features = Tensor("Const:0", shape=(20, 128, 128, 3), dtype=float32)
conv = Tensor("conv2d/BiasAdd:0", shape=(20, 128, 128, 32), dtype=float32)
pool = Tensor("max_pooling2d/MaxPool:0", shape=(20, 128, 128, 32), dtype=float32)
I was expecting to see the output from the MaxPool layer to have a shape of (20,64,64,32).
Am I using this this correctly?

If you want to downsample by a factor of 2 your feature map, you should use a stride 2.
In [1]: tf.layers.MaxPooling2D(2, 2, padding='same')(conv)
Out[1]: <tf.Tensor 'max_pooling2d/MaxPool:0' shape=(20, 64, 64, 32) dtype=float32>

Related

How I can solve a problem with input size in a keras model?

I have created a model with the functional API with
shape = (128,128,1)
input = Input(shape=shape)
print(input.shape)
x = Conv2D(8,kernel_size=(7,7),padding="valid",name='input_conv'
,activation='relu')(input)
etc...
the shape have the expected size, the model summary shows
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 128, 128, 1)] 0
input_conv (Conv2D) (None, 122, 122, 8) 400
I can fit and save the model. When I load the model and try to make a prediction with
print(f'''image shape {im.shape} {type(im)}''')
p = model.predict(im)
I get the error
image shape (128, 128, 1) <class 'numpy.ndarray'>
.
WARNING:tensorflow:Model was constructed with shape (None, 128, 128, 1) for input KerasTensor(type_spec=TensorSpec(shape=(None, 128, 128, 1), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'"), but it was called on an input with incompatible shape (32, 128, 1, 1).
im is a numpy array with (128,128,1) so I can't figure out where (32, 128, 1, 1) is coming from.
Thanks
Very sorry, problem between chair and keyboard.
p = model.predict(im.reshape(1,128,128,1))
Solved the thing.
32 is the batch size. You need to add one more dimension before predicting since the model expects batches of images. So you need to send a batch of a single image:
import numpy as np
print(f'''image shape {im.shape} {type(im)}''')
p = model.predict(np.expand_dims(im, 0))

How to use Conv2D in multiple images input?

I want to use multiple images as input of the network. And I want to add Conv2D layers, something like that:
from tensorflow.keras.layers import *
from tensorflow.keras.models import Sequential
model = Sequential([
Input(shape=(1, 128, 128, 1)),
Conv2D(32, 3),
Flatten(),
])
But this code raises the error: Input 0 of layer conv2d_40 is incompatible with the layer: expected ndim=4, found ndim=5. Full shape received: [None, 1, 128, 128, 1]
But the code below is working fine:
model = Sequential([
Input(shape=(1, 512, 512, 1)),
Dense(32),
Flatten(),
])
I know, I can add multiple Input layers, but I want to know is there a way to make it like this?
I mean I want to use data of input shape [NUMBER_OF_IMAGES, WIDTH, HEIGHT, N_CHANNELS]
And NUMBER_OF_IMAGES is not amount of all images. This is an amount for current input
Conv2D expects input in 4D, you can't change that. I'm not exactly sure what you're trying to accomplish but you could use Conv3D instead:
from tensorflow.keras.layers import *
from tensorflow.keras.models import Sequential
import tensorflow as tf
model = Sequential([
Input(shape=(None, 128, 128, 1)),
Conv3D(32, kernel_size=(1, 3, 3)),
Flatten()
])
multiple_images = tf.random.uniform((10, 10, 128, 128, 1), dtype=tf.float32)
model(multiple_images)
<tf.Tensor: shape=(10, 5080320), dtype=float32, numpy=
array([[-0.26742983, -0.09689523, -0.12120364, ..., -0.02987139,
0.05515741, 0.12026916],
[-0.18898709, 0.12448274, -0.17439063, ..., 0.23424357,
-0.06001307, -0.13852882],
[-0.14464797, 0.26356792, -0.34748033, ..., 0.07819699,
-0.11639086, 0.10701762],
...,
[-0.1536693 , 0.13642962, -0.18564 , ..., 0.07165999,
-0.0173855 , -0.04348694],
[-0.32320747, 0.09207243, -0.22274591, ..., 0.11940736,
-0.02635285, -0.1140241 ],
[-0.21126074, -0.00094431, -0.10933039, ..., 0.06002581,
-0.09649743, 0.09335127]], dtype=float32)>

tf.nn.conv2d in numpy or scipy (with 4-d weights)?

For part of an embedded project, I trained a network in Tensorflow, and now I'm reloading the variables in a Numpy/Scipy-based model script. However, I am unclear on how to redo the conv2d steps with the weights I have.
I've looked at this link: Difference between Tensorflow convolution and numpy convolution,
but I haven't made the connection to a problem where the weights are four-dimensional.
This is my Tensorflow code:
# input shape: (1, 224, 224, 1)
weight1 = tf.Variable([3,3,1,16],stddev)
conv1 = tf.nn.conv2d(input,w,[1,1,1,1])
# conv1 shape: (1, 224, 224, 16)
weight2 = tf.Variable([3,3,16,32],stddev)
conv2 = tf.nn.conv2d(conv2,w,[1,1,1,1])
# conv2 shape: (1, 224, 224, 32)
And when I try to use convolve functions from Scipy or Numpy libraries, the output dimensions are incorrect:
from scipy.ndimage.filters import convolve
conv1 = convolve(input, weight1[::-1])
# conv1 shape: (1, 224, 224, 1)
conv2 = convolve(conv1, weight2[::-1])
# conv2 shape: (1, 224, 224, 16)

ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis

I'm trying to implement a special type of neural network with Keras functional API, as seen below:
But I'm having a problem with the concatenate layer:
ValueError: A "Concatenate" layer requires inputs with matching shapes
except for the concat axis. Got inputs shapes: [(None, 160, 160, 384),
(None, 160, 160, 48)]
Notice: From my research I assume that this question is not duplicate, I've seen this question, and this post (translated with Google), but they don't seem to work (instead, they make problems even slightly "worse").
Here's the code of the neural network before concat layer:
from keras.layers import Input, Dense, Conv2D, ZeroPadding2D, MaxPooling2D, BatchNormalization, concatenate
from keras.activations import relu
from keras.initializers import RandomUniform, Constant, TruncatedNormal
# Network 1, Layer 1
screenshot = Input(shape=(1280, 1280, 0), dtype='float32', name='screenshot')
# padded1 = ZeroPadding2D(padding=5, data_format=None)(screenshot)
conv1 = Conv2D(filters=96, kernel_size=11, strides=(4, 4), activation=relu, padding='same')(screenshot)
# conv1 = Conv2D(filters=96, kernel_size=11, strides=(4, 4), activation=relu, padding='same')(padded1)
pooling1 = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(conv1)
normalized1 = BatchNormalization()(pooling1) # https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn
# Network 1, Layer 2
# padded2 = ZeroPadding2D(padding=2, data_format=None)(normalized1)
conv2 = Conv2D(filters=256, kernel_size=5, activation=relu, padding='same')(normalized1)
# conv2 = Conv2D(filters=256, kernel_size=5, activation=relu, padding='same')(padded2)
normalized2 = BatchNormalization()(conv2)
# padded3 = ZeroPadding2D(padding=1, data_format=None)(normalized2)
conv3 = Conv2D(filters=384, kernel_size=3, activation=relu, padding='same',
kernel_initializer=TruncatedNormal(stddev=0.01),
bias_initializer=Constant(value=0.1))(normalized2)
# conv3 = Conv2D(filters=384, kernel_size=3, activation=relu, padding='same',
# kernel_initializer=RandomUniform(stddev=0.1),
# bias_initializer=Constant(value=0.1))(padded3)
# Network 2, Layer 1
textmaps = Input(shape=(160, 160, 128), dtype='float32', name='textmaps')
txt_conv1 = Conv2D(filters=48, kernel_size=1, activation=relu, padding='same',
kernel_initializer=TruncatedNormal(stddev=0.01), bias_initializer=Constant(value=0.1))(textmaps)
# (Network 1 + Network 2), Layer 1
merged = concatenate([conv3, txt_conv1], axis=1)
This is how interpreter evaluates variables conv3 and txt_conv1:
>>> conv3
<tf.Tensor 'conv2d_3/Relu:0' shape=(?, 160, 160, 384) dtype=float32>
>>> txt_conv1
<tf.Tensor 'conv2d_4/Relu:0' shape=(?, 160, 160, 48) dtype=float32>
This is how the interpreter evaluates txt_conv1 and conv3 variables after setting image_data_format to channels_first:
>>> conv3
<tf.Tensor 'conv2d_3/Relu:0' shape=(?, 384, 160, 0) dtype=float32>
>>> txt_conv1
<tf.Tensor 'conv2d_4/Relu:0' shape=(?, 48, 160, 128) dtype=float32>
Both of the layers have shapes which are not actually described in the architecture.
Is there any way to solve this problem? Maybe I didn't write the appropriate code (I'm new to Keras).
P.S
I know that the code above is not organized, I'm just testing.
Thank you!
You should change the axis to -1 in the concatenate layer since the shapes of the two tensors that you want to concatenate only differ in their last dimension. The resulting tensor will then be of shape (?, 160, 160, 384 + 48).

convert Lasagne to Keras code (CNN -> LSTM)

I would like to convert this Lasagne code:
et = {}
net['input'] = lasagne.layers.InputLayer((100, 1, 24, 113))
net['conv1/5x1'] = lasagne.layers.Conv2DLayer(net['input'], 64, (5, 1))
net['shuff'] = lasagne.layers.DimshuffleLayer(net['conv1/5x1'], (0, 2, 1, 3))
net['lstm1'] = lasagne.layers.LSTMLayer(net['shuff'], 128)
in Keras code. Currently I came up with this:
multi_input = Input(shape=(1, 24, 113), name='multi_input')
y = Conv2D(64, (5, 1), activation='relu', data_format='channels_first')(multi_input)
y = LSTM(128)(y)
But I get the error: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
Solution
from keras.layers import Input, Conv2D, LSTM, Permute, Reshape
multi_input = Input(shape=(1, 24, 113), name='multi_input')
print(multi_input.shape) # (?, 1, 24, 113)
y = Conv2D(64, (5, 1), activation='relu', data_format='channels_first')(multi_input)
print(y.shape) # (?, 64, 20, 113)
y = Permute((2, 1, 3))(y)
print(y.shape) # (?, 20, 64, 113)
# This line is what you missed
# ==================================================================
y = Reshape((int(y.shape[1]), int(y.shape[2]) * int(y.shape[3])))(y)
# ==================================================================
print(y.shape) # (?, 20, 7232)
y = LSTM(128)(y)
print(y.shape) # (?, 128)
Explanations
I put the documents of Lasagne and Keras here so you can do cross-referencing:
Lasagne
Recurrent layers can be used similarly to feed-forward layers except
that the input shape is expected to be (batch_size, sequence_length, num_inputs)
Keras
Input shape
3D tensor with shape (batch_size, timesteps, input_dim).
Basically the API is the same, but Lasagne probably does reshape for you (I need to check the source code later). That's why you got this error:
Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
, since the tensor shape after Conv2D is (?, 64, 20, 113) of ndim=4
Therefore, the solution is to reshape it to (?, 20, 7232).
Edit
Confirmed with the Lasagne source code, it does the trick for you:
num_inputs = np.prod(input_shape[2:])
So the correct tensor shape as input for LSTM is (?, 20, 64 * 113) = (?, 20, 7232)
Note
Permute is redundant here in Keras since you have to reshape anyway. The reason why I put it here is to have a "full translation" from Lasagne to Keras, and it does what DimshuffleLaye does in Lasagne.
DimshuffleLaye is however needed in Lasagne because of the reason I mentioned in Edit, the new dimension created by Lasagne LSTM is from the multiplication of "the last two" dimensions.

Categories