Imbalanced aspect ratio of input in CNN

Imbalanced aspect ratio of input in CNN - python

Consider the following model using keras in TensorFlow.
Conv2D(
filter = 2^(5+i) # i = num of times called conv2D
kernel = (3, 3),
strides = (1, 1),
padding = 'valid')
MaxPooling2D(
pool_size = (2, 2))
Layer Output Shape Param
-----------------------------------------------
L0: Shape (50, 250, 1 ) 0
L1: Conv2D (48, 248, 32 ) 320
L2: MaxPooling2D (24, 124, 32 ) 0
L3: Conv2D_1 (22, 122, 64 ) 18496
L4: MaxPooling2D_1 (11, 61, 64 ) 0
L5: Conv2D_2 (9, 59, 128) 73856
L6: MaxPooling2D_2 (4, 29, 128) 0
L7: Conv2D_3 (2, 27, 256) 295168 !!
L8: MaxPooling2D_3 (1, 13, 256) 0
L9: Flatten (3328) 0
L10: Dense (512) 1704448 !!!
L11: ...
Here, an input shape with ratio of 1:5 is used. After L8, there cannot be any more convolutional layers as one side is 1. Actually in cases where input_side < kernel_size, there can be no more convolutional layers; the layer is forced to be flattened into a vector with high number of units – resulted from the large shape [1][3] and the large number of filters [2] deep into the network. The Dense layer [4] follows will have a high number of parameters that requires a lot of computation time.
To reduce the number of parameters specific to the problems (highlighted in [x]) above, I think of these methods:
Adding a (1, 2) stride to early layers of Conv2D. (Refer to this thread)
Reduce the number of filters, say, from [32, 64, 128, 256, ...] to [16, 24, 32, 48, ...].
Resize the input data to a square-shaped input so that more Conv2D layers can be applied.
Future reduce the number of units in the first Dense layer, say, from 512 to 128.
My question is, will these method work and how much will they affect the performance of the CNN? Is there any better approach to the problem? Thanks.

First of all, you can try 'same' padding instead of valid. It will save you from these diminishing numbers you are getting, somewhat.
For point 1:
Adding a non-uniform stride is only good if your data has more variation in a certain direction in this case, horizontal.
For point 2:
The number of filters don't help or hurt the way your dimensions are changing. This would hurt your performance if your model was not overfitting.
For point 3:
Resize the input to square shape would seem like a good idea but would lead to unnecessary dead neurons because of all that extras you are adding. I would suggest against it. This may hurt performance and lead to overfitting.
For point 4:
Here, again the number of units don't change the dimensions. This would hurt your performance if your model was not overfitting.
Lastly, your network is deep enough to get good results. Rather than trying to go smaller and smaller. Try increasing your Conv2D layers in between the MaxPools, that would be much better.

Related

Keras CNN for non-image matrix

I have recently started learning about Deep Learning and Reinforcement Learning, and I am trying to figure out how to code a Convolutional Neural Network using Keras for a matrix of 0s and 1s with 10 rows and 3 columns.
The input matrix would look like this for example
[
[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
...
]
The output should be another matrix of 0s and 1s, different from the aforementioned input matrix and with a different number of rows and columns.
The location of 0s and 1s in the output matrix is dependent on the location of the 0s and 1s in the input matrix.
There is also a second output, an array where the values are dependent on the location of the 1 in the input matrix.
I have searched the internet for code examples but couldn't find anything useful.
Edit:
The input to the neural network is a 2D array with 10 rows and each row has 3 columns.
The output (for now at least) is a 2D array with 12 rows and each row has 10 columns (the same as the number of rows in the input 2D array).
This is what I came up with so far and I have no idea if it's correct or not.
nbre_users = 10 # number of rows in the input 2D matrix
nbre_subchannels = 12 # number of rows in the output 2D matrix
model = Sequential()
model.add(Dense(50, input_shape=(nbre_users, 3), kernel_initializer="he_normal" ,activation="relu"))
model.add(Dense(20, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(5, kernel_initializer="he_normal", activation="relu"))
model.add(Flatten())
model.add(Dense(nbre_subchannels))
model.add(Dense(nbre_users, activation = 'softmax'))
model.compile(optimizer=Adam(learning_rate=1e-4), loss='mean_squared_error')
Here is the model summary:

After clarifications, here is my answer.
The problem you are trying to solve seems to be a neural network that transforms a 2D grayscale image of size (10,3,1) to a 2D grayscale image of size (12,10,1).
A 2D grayscale image is nothing but a 2D matrix with an extra axis set to 1.
a = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
a.shape
#OUTPUT = (3,3)
a.reshape((3,3,1)) #reshape to 3,3,1
#OUTPUT -
#array([[[0],
# [1],
# [0]],
#
# [[1],
# [0],
# [1]],
#
# [[0],
# [1],
# [0]]])
So a 2D matrix of (10,3) can be called a 3D image with a single channel (10,3,1). This will allow you to properly apply convolutions to your input.
If this part is clear, then in the forward computation of the network, since you want to ensure that spatial positions of the 1s and 0s are captured, you want to use convolution layers. Using Dense layers here is not the right step.
However, a series convolution operation help to Downsample and image. Since you need an output 2D matrix (gray scale image), you want to Upsample as well. Such a network is called a Deconv network.
The first series of layers convolve over the input, 'flattening' them into a vector of channels. The next set of layers use 2D Conv Transpose operations to change the channels back into a 2D matrix (Gray scale image)
Refer to this image for reference -
Here is a sample code that shows you how you can take a (10,3,1) image to a (12,10,1) image using a deconv net.
from tensorflow.keras import layers, Model
inp = layers.Input((10,3,1)) ##
x = layers.Conv2D(2, (2,2))(inp) ## Convolution part
x = layers.Conv2D(4, (2,2))(x) ##
x = layers.Conv2DTranspose(4, (3,4))(x) ##
x = layers.Conv2DTranspose(2, (2,4))(x) ## Deconvolution part
out = layers.Conv2DTranspose(1, (2,4))(x) ##
model = Model(inp, out)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_33 (InputLayer) [(None, 10, 3, 1)] 0
_________________________________________________________________
conv2d_49 (Conv2D) (None, 9, 2, 2) 10
_________________________________________________________________
conv2d_50 (Conv2D) (None, 8, 1, 4) 36
_________________________________________________________________
conv2d_transpose_46 (Conv2DT (None, 10, 4, 4) 196
_________________________________________________________________
conv2d_transpose_47 (Conv2DT (None, 11, 7, 2) 66
_________________________________________________________________
conv2d_transpose_48 (Conv2DT (None, 12, 10, 1) 17
=================================================================
Total params: 325
Trainable params: 325
Non-trainable params: 0
_________________________________________________________________
Obviously, feel free to add activations, dropouts, pooling layers etc etc etc. The above code just shows how you can use downsampling and upsampling to get from a given single-channel image to another single-channel image.
On a side note - I would really advise that you spend some time understanding how CNNs work. Deconv nets are complex and if you are solving a problem that involves them, before properly understanding how 2D CNNs work, it may cause some foundational problems especially if you are starting to learn this domain.

You can use 1D convolutional layers if you want to convolve in a single spatial, which from what I understood what you want.
e.g.
# assuming 3x10 matrix with single batch
input_shape = (1, 3, 10)
y = tf.keras.layers.Conv1D(32, 3, activation='relu',input_shape=input_shape[1:])(x)

ML wrong prediction on Japan Crossword puzzle

I’m trying to study machine learning in hands-on way. I found exercise for myself to create neural network that solves “Japan crosswords” for fixed size images (128*128).
Very simple example (4*4) demonstrates the conception: black & white picture encoded by top and left matrices. Number in matrix means continues length of black line. Easy to prove left and top matrix have dimension at max (N*(N/2)) and ((N/2)*N) correspondingly.
I have a python generator that creates random b&w images and 2 reduced matrices. Top and left matrices are fed as input (left is transposed to match top) and b&w as an expected output. Input is treated as 3-dim (128 * 64 * 2) where 2 – is top and left correspondingly.
Following is my current topology that try to build function (128 * 64 * 2) -> (128, 128, 1)
Model: "model"
Layer (type) Output Shape Param #
interlaced_reduce (InputLaye [(None, 128, 64, 2)] 0
small_conv (Conv2D) (None, 128, 64, 32) 288
leaky_re_lu (LeakyReLU) (None, 128, 64, 32) 0
medium_conv (Conv2D) (None, 128, 64, 64) 8256
leaky_re_lu_1 (LeakyReLU) (None, 128, 64, 64) 0
large_conv (Conv2D) (None, 128, 64, 128) 32896
leaky_re_lu_2 (LeakyReLU) (None, 128, 64, 128) 0
up_sampling2d (UpSampling2D) (None, 128, 128, 128) 0
dropout (Dropout) (None, 128, 128, 128) 0
dense (Dense) (None, 128, 128, 1) 129
Total params: 41,569
Trainable params: 41,569
Non-trainable params: 0
After train on 50 images I got the following statistic (please note, I tried to normalize input matrices to [0,1] without any success, current statistic demonstrate non-normalized case) :
...
Epoch 50/50 2/2 [==============================] - 1s 687ms/step -loss: 18427.2871 - mae: 124.9277
Then prediction produces following:
You can see left – expected random image and right – result of prediction. In prediction I intentionally use grey-scaled image to understand how close my result to target. But as you can see – the prediction is far from expected and is close to source form of top/left reduce matrices.
So my questions:
1) What layers I’m missing?
2) What should be improved in existing topology?
p.s. this is cross post from Cross Validated Stackexchange, because nobody even viewed question that site

So it's hard to say what model would work best without training and testing the actual model, but from the results you've gotten so far here's a few options you could try.
Try adding a fully connected hidden layer
From the model you posted, it seems that you have a few convolution layers, followed by an up-sampling and dropout layer, and finally a single dense layer for your output nodes. Potentially, adding additional dense layers (for e.g. 128 or more or less nodes) before your final output layer might help. While the multiple convolution layers help the neural net to build up a sort of hierarchical understanding of the image, the hypothesis class might not be complex enough. Adding one or more dense layers might help with this.
Try using a multilayer perceptron
Convolution layers are often used to process images because they help build up a hierarchical understanding of the image that is somewhat scale/shift/rotation invariant. However, considering the problem that you're solving, a global understanding of the input might be more beneficial than identifying shift-invariant features.
As such, one possible option would be to remove the convolution layers and to use a multilayer perceptron (MLP).
Let us think of the input as two matrices of numbers, and the output is a matrix of 1s and 0s that correspond to 'black' and 'white'. You could then try a model with the following layers:
A Flatten layer that takes in your two reduced matrices as inputs and flattens
them
A hidden dense layer, maybe with something like 128 nodes and relu activation. You should experiment with the number of layers, nodes, and activation.
An output dense layer with 16384 (128x128) nodes. You could apply a softmax activation to this layer which could help the optimiser during the training process. Then, when creating your final image, set values < 0.5 to 0 and values >= 0.5 to 1, and reshape and reformat the matrix into a square image.
Of course, no guarantees that an MLP would work well, but if often does especially when given sufficient amounts of data (perhaps in the 1000s or more number of training examples).
Try using a deterministic algorithm
Looking at the structure of this problem, it seems that it could be solved more appropriately with a deterministic algorithm, which would fall under more the branch of traditional artificial intelligence rather than deep learning. This is also another potential route to explore.

The model you build is a conventional model (seen by the use of Conv2D). This layer are good in analyzing something given its neighbors. Making them very powerful for image classification or segmentation.
In your case the result of a pixels is depending on the whole line and column.
Neural networks seems to be unsuited for your problem, but if you want to continue look in to replacing conv layers with Conv(1xN) and Conv(Nx1). It will still be very hard to make it work.
The hard way: These puzzle exist out of a strong recurrent process. Each step the correct spots get filled int with a zero or one. Based on those the next get filled in. So a recurrent neural network would make most sense to me. Where the convolution is used to have the prediction of the neighbors influence its current prediction

Constraint on dimensions of activation/feature map in convolutional network

Let's say input to intermediate CNN layer is of size 512×512×128 and that in the convolutional layer we apply 48 7×7 filters at stride 2 with no padding. I want to know what is the size of the resulting activation map?
I checked some previous posts (e.g., here or here) to point to this Stanford course page. And the formula given there is (W − F + 2P)/S + 1 = (512 - 7)/2 + 1, which would imply that this set up is not possible, as the value we get is not an integer.
However if I run the following snippet in Python 2.7, the code seems to suggest that the size of activation map was computed via (512 - 6)/2, which makes sense but does not match the formula above:
>>> import torch
>>> conv = torch.nn.Conv2d(in_channels=128, out_channels=48, kernel_size=7, stride=2, padding=0)
>>> conv
Conv2d(128, 48, kernel_size=(7, 7), stride=(2, 2))
>>> img = torch.rand((1, 128, 512, 512))
>>> out = conv(img)
>>> out.shape
(1, 48, 253, 253)
Any help in understanding this conundrum is appreciated.

Here is the formula being used in pytorch: conv2d(go to the shape section)
Also, as far as I know, this is the best tutorial on this subject.
Bonus: here is a neat visualizer for conv calculations.

Shape of 1D convolution output on a 2D data using keras

I am trying to implement a 1D convolution on a time series classification problem using keras. I am having some trouble interpreting the output size of the 1D convolutional layer.
I have my data composed of the time series of different features over a time interval of 128 units and I apply a 1D convolutional layer:
x = Input((n_timesteps, n_features))
cnn1_1 = Conv1D(filters = 100, kernel_size= 10, activation='relu')(x)
which after compilation I obtain the following shapes of the outputs:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_26 (InputLayer) (None, 128, 9) 0
_________________________________________________________________
conv1d_28 (Conv1D) (None, 119, 100) 9100
I was assuming that with 1D convolution, the data is only convoluted across the time axis (axis 1) and the size of my output would be:
119, 100*9. But I guess that the network is performing some king of operation across the feature dimension (axis 2) and I don't know which operation is performing.
I am saying this because what I interpret as 1d convolution is that the features shapes must be preserved because I am only convolving the time domain: If I have 9 features, then for each filter I have 9 convolutional kernels, each of these applied to a different features and convoluted across the time axis. This should return 9 convoluted features for each filter resulting in an output shape of 119, 9*100.
However the output shape is 119, 100.
Clearly something else is happening and I can't understand it or get it.
where am I failing my reasoning? How is the 1d convolution performed?
I add one more comment which is my comment on one of the answers provided:
I understand the reduction from 128 to 119, but what I don't understand is why the feature dimension changes. For example, if I use
Conv1D(filters = 1, kernel_size= 10, activation='relu')
, then the output dimension is going to be (None, 119, 1), giving rise to only one feature after the convolution. What is going on in this dimension, which operation is performed to go from from 9 --> 1?

Conv1D needs 3D tensor for its input with shape (batch_size,time_step,feature). Based on your code, the filter size is 100 which means filter converted from 9 dimensions to 100 dimensions. How does this happen? Dot Product.
In above, X_i is the concatenation of k words (k = kernel_size), l is number of filters (l=filters), d is the dimension of input word vector, and p_i is output vector for each window of k words.
What happens in your code?
[n_features * 9] dot [n_features * 9] => [1] => repeat l-times => [1 * 100]
do above for all sequences => [128 * 100]
Another thing that happens here is you did not specify the padding type. According to the docs, by default Conv1d use valid padding which caused your dimension to reduce from 128 to 119. If you need the dimension to be the same as the input you can choose the same option:
Conv1D(filters = 100, kernel_size= 10, activation='relu', padding='same')

It Sums over the last axis, which is the feature axis, you can easily check this by doing the following:
input_shape = (1, 128, 9)
# initialize kernel with ones, and use linear activations
y = tf.keras.layers.Conv1D(1,3, activation="linear", input_shape=input_shape[2:],kernel_initializer="ones")(x)
y :
if you sum x along the feature axis you will get:
x
Now you can easily see that the sum of the first 3 values of sum of x is the first value of convolution, I used a kernel size of 3 to make this verification easier

Keras dimensionality in convolutional layer mismatch

I'm trying to play around with Keras to build my first neural network. I have zero experience and I can't seem to figure out why my dimensionality isn't right. I can't figure it out from their docs what this error is complaining about, or even what layer is causing it.
My model takes in a 32byte array of numbers, and is supposed to give a boolean value on the other side. I want a 1D convolution on the input byte array.
arr1 is the 32byte array, arr2 is an array of booleans.
inputData = np.array(arr1)
inputData = np.expand_dims(inputData, axis = 2)
labelData = np.array(arr2)
print inputData.shape
print labelData.shape
model = k.models.Sequential()
model.add(k.layers.convolutional.Convolution1D(32,2, input_shape = (32, 1)))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.core.Dense(32))
model.add(k.layers.Activation('sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer = 'rmsprop',
metrics=['accuracy'])
model.fit(
inputData,labelData
)
The output of the print of shapes are
(1000, 32, 1) and (1000,)
The error I receive is:
Traceback (most recent call last): File "cnn/init.py", line
50, in
inputData,labelData File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/models.py",
line 863, in fit
initial_epoch=initial_epoch) File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/engine/training.py",
line 1358, in fit
batch_size=batch_size) File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/engine/training.py",
line 1238, in _standardize_user_data
exception_prefix='target') File "/home/steve/Documents/cnn/env/local/lib/python2.7/site-packages/keras/engine/training.py",
line 128, in _standardize_input_data
str(array.shape)) ValueError: Error when checking target: expected activation_5 to have 3 dimensions, but got array with shape (1000, 1)

Well It seems to me that you need to google a bit more about convolutional networks :-)
You are applying at each step 32 filters of length 2 over yout sequence. So if we follow the dimensions of the tensors after each layer :
Dimensions : (None, 32, 1)
model.add(k.layers.convolutional.Convolution1D(32,2, input_shape = (32, 1)))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 31, 32)
(your filter of length 2 goes over the whole sequence so the sequence is now of length 31)
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 30, 32)
(you lose again one value because of your filters of length 2, but you still have 32 of them)
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 29, 32)
(same...)
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
Dimensions : (None, 28, 32)
Now you want to use a Dense layer on top of that... the thing is that the Dense layer will work as follow on your 3D input :
model.add(k.layers.core.Dense(32))
model.add(k.layers.Activation('sigmoid'))
Dimensions : (None, 28, 32)
This is your output. First thing that I find weird is that you want 32 outputs out of your dense layer... You should have put 1 instead of 32. But even this will not fix your problem. See what happens if we change the last layer :
model.add(k.layers.core.Dense(1))
model.add(k.layers.Activation('sigmoid'))
Dimensions : (None, 28, 1)
This happens because you apply a dense layer to a '2D' tensor. What it does in case you apply a dense(1) layer to an input [28, 32] is that it produces a weight matrix of shape (32,1) that it applies to the 28 vectors so that you find yourself with 28 outputs of size 1.
What I propose to fix this is to change the last 2 layers like this :
model = k.models.Sequential()
model.add(k.layers.convolutional.Convolution1D(32,2, input_shape = (32, 1)))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
model.add(k.layers.convolutional.Convolution1D(32,2))
model.add(k.layers.Activation('relu'))
# Only use one filter so that the output will be a sequence of 28 values, not a matrix.
model.add(k.layers.convolutional.Convolution1D(1,2))
model.add(k.layers.Activation('relu'))
# Change the shape from (None, 28, 1) to (None, 28)
model.add(k.layers.core.Flatten())
# Only one neuron as output to get the binary target.
model.add(k.layers.core.Dense(1))
model.add(k.layers.Activation('sigmoid'))
Now the last two steps will take your tensor from
(None, 29, 32) -> (None, 28, 1) -> (None, 28) -> (None, 1)
I hope this helps you.
ps. if you were wondering what None is , it's the dimension of the batch, you don't feed the 1000 samples at onces, you feed it batch by batch and as the value depends on what is chosen, by convension we put None.
EDIT :
Explaining a bit more why the sequences length loses one value at each step.
Say you have a sequence of 4 values [x1 x2 x3 x4], you want to use your filter of length 2 [f1 f2] to convolve over the sequence. The first value will be given by y1 = [f1 f2] * [x1 x2], the second will be y2 = [f1 f2] * [x2 x3], the third will be y3 = [f1 f2] * [x3 x4]. Then you reached the end of your sequence and cannot go further. You have as a result a sequnce [y1 y2 y3].
This is due to the filter length and the effects at the borders of your sequence. There are multiple options, some pad the sequence with 0's in order to get exactly the same length of output... You can chose that option with the parameter 'padding'. You can read more about this here and find the different values possible for the padding argument here. I encourage you to read this last link, it gives informations about input and output shapes...
From the doc :
padding: One of "valid" or "same" (case-insensitive). "valid" means "no padding". "same" results in padding the input such that the output has the same length as the original input.
the default is 'valid', so you don't pad in your example.
I also recommend you to upgrade your keras version to the latest. Convolution1D is now Conv1D, so you might find the doc and tutorials confusing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.