I am new to keras.
My goal is to have total of 4 max pooling layers. All of them take same input with shape (N, 256). The first layer does global max pooling and give 1 output. The second layer with N / 2 pooling size and N / 2 stride, gives 2 outputs. The third gives 4 outputs and the fourth gives 8 outputs. Here is my code.
test_x = np.random.rand(N, 256, 1)
model = Sequential()
input1 = Input(shape=test_x.shape, name='input1')
input2 = Input(shape=test_x.shape, name='input2')
input3 = Input(shape=test_x.shape, name='input3')
input4 = Input(shape=test_x.shape, name='input4')
max1 = MaxPooling2D(pool_size=(N, 256), strides=N)(input1)
max2 = MaxPooling2D(pool_size=(N / 2, 256), strides=N / 2)(input2)
max3 = MaxPooling2D(pool_size=(N / 4, 256), strides=N / 4)(input3)
max4 = MaxPooling2D(pool_size=(N / 8, 256), strides=N / 8)(input4)
mrg = Merge(mode='concat')([max1, max2, max3, max4])
After creating 4 max pooling layers, I try to merge them together, but keras gives this error.
ValueError: Dimension 1 in both shapes must be equal, but are 4 and 8 for 'merge_1/concat' (op: 'ConcatV2') with input shapes: [?,1,1,1], [?,2,1,1], [?,4,1,1], [?,8,1,1], [] and with computed input tensors: input[4] = <3>.
How can I solve this issue? Is merging the correct way to achieve my goal in keras?
For concatenation, all dimensions must have the same number of elements, except for the concat dimension itself.
As you can see, your results have shape:
(?, 1, 1, 1)
(?, 2, 1, 1)
(?, 4, 1, 1)
(?, 8, 1, 1)
Naturally, the only possible way to concatenate them is in the second axis (axis=1)
mrg = Concatenate(axis=1)([max1,max2,max3,max4])
But notice that (unless you have specific reasons for that and know exaclty what you're doing) this will result in a very weird image, since you're concatenating in a spatial dimension, not in a channel dimension.
Related
I'm trying to train a neural network in PyTorch with some input signals. The layers are conv1d. The shape of my input is [100, 10], meaning 100 signals of a length of 10.
But when I execute the training, I have this error:
Given groups=1, weight of size [100, 10, 1], expected input[1, 1, 10] to have 10 channels, but got 1 channels instead
config = [10, 100, 100, 100, 100, 100, 100, 100]
batch_size = 1
epochs = 10
learning_rate = 0.001
kernel_size = 1
class NeuralNet(nn.Module):
def __init__(self, config, kernel_size=1):
super().__init__()
self.config = config
self.layers = nn.ModuleList([nn.Sequential(
nn.Conv1d(self.config[i], self.config[i + 1], kernel_size = kernel_size),
nn.ReLU())
for i in range(len(self.config)-1)])
self.last_layer = nn.Linear(self.config[-1], 3)
self.layers.append(nn.Flatten())
self.layers.append(self.last_layer)
def forward(self, x):
for i, l in enumerate(self.layers):
x = l(x)
return x
def loader(train_data, batch_size):
inps = torch.tensor(train_data[0])
tgts = torch.tensor(train_data[1])
inps = torch.unsqueeze(inps, 1)
dataset = TensorDataset(inps, tgts)
train_dataloader = DataLoader(dataset, batch_size = batch_size)
return train_dataloader
At first, my code was without the unsqueez(inps) line and I had the exact same error, but then I added this line thinking that I must have an input of size (num_examples, num_channels, lenght_of_signal) but it didn't resolve the problem at all.
Thank you in advance for your answers
nn.Conv1d expects input with shape of form (batch_size, num_of_channels, seq_length). It's parameters allow to directly set number of ouput channels (out_channels) and change length of output using, for example, stride. For conv1d layer to work correctly it should know number of input channels (in_channels), which is not the case on first convolution: input.shape == (batch_size, 1, 10), therefore num_of_channels = 1, while convolution in self.layers[0] expects this value to be equal 10 (because in_channels set by self.config[0] and self.config[0] == 10). Hence to fix this append one more value to config:
config = [10, 100, 100, 100, 100, 100, 100, 100] # as in snippet above
config = [1] + config
At this point convs should be working fine, but there is another obstacle in self.layers -- linear layer at the end. So if kernel_size of 1 was used, then after final convolution batch will have shape (batch_size, 100, 10), and after flatten (batch_size, 100 * 10), while last_layer expects input of shape (batch_size, 100). So, if length of sequence after final conv layer is known (which is certainly the case if you're using kernel_size of 1 with default stride of 1 and default padding of 0 -- length stays same), last_layer should be defined as:
self.last_layer = nn.Linear(final_length * self.config[-1], 3)
and in snippet above final_length can be set to 10 (since conditions in previous brackets satisfied). To catch idea of how shapes in conv1d transformed take look at simple example in gif below (here batch_size is equal to 1):
I used Taylor expansion in image classification task. Basically, firstly, pixel vector is generated from RGB image, and each pixel values from pixel vector is going to approximated with Taylor series expansion of sin(x). In tensorflow implementation, I tried possible of coding up this with tensorflow, and I still have some problem when I tried to create feature maps by stacking tensor with expansion terms. Can anyone provide possible perspective how can I make my current attempt more efficient? Any possible thoughts?
Here is the expansion terms of Taylor series of sin(x):
here is my current attempt:
term = 2
c = tf.constant([1, -1/6])
power = tf.constant([1, 3])
x = tf.keras.Input(shape=(32, 32, 3))
res =[]
for x in range(term):
expansion = c * tf.math.pow(tf.tile(x[..., None], [1, 1, 1, 1, term]),power)
m_ij = tf.math.cumsum(expansion, axis=-1)
res.append(m_i)
but this is not quite working because I want to create input features maps from each expansion neurons, delta_1, delta_2 needs to be stacked, which I didn't make correctly in my above attempt, and my code is not well generalized also. How can I refine my above coding attempts in correct way of implementation? Can any one give me possible ideas or canonical answer to improve my current attempts?
If doing series expansion as described, if the input has C channels and the expansion has T terms, the expanded input should have C*T channels and otherwise be the same shape. Thus, the original input and the function being approximated up to each term should be concatenated along the channel dimension. It is a bit easier to do this with a transpose and reshape than an actual concatenate.
Here is example code for a convolutional network trained on CIFAR10:
inputs = tf.keras.Input(shape=(32, 32, 3))
x = inputs
n_terms = 2
c = tf.constant([1, -1/6])
p = tf.constant([1, 3], dtype=tf.float32)
terms = []
for i in range(n_terms):
m = c[i] * tf.math.pow(x, p[i])
terms.append(m)
expansion = tf.math.cumsum(terms)
expansion_terms_last = tf.transpose(expansion, perm=[1, 2, 3, 4, 0])
x = tf.reshape(expansion_terms_last, tf.constant([-1, 32, 32, 3*n_terms]))
x = Conv2D(32, (3, 3), input_shape=(32,32,3*n_terms))(x)
This assumes the original network (without expansion) would have a first layer that looks like this:
x = Conv2D(32, (3, 3), input_shape=(32,32,3))(inputs)
and the rest of the network is exactly the same as it would be without expansion.
terms contains a list of c_i * x ^ p_i from the original; expansion contains the sum of the terms (1st, then 1st and 2nd, etc), in a single tensor (where T is the first dimension). expansion_terms_last moves the T dimension to be last, and the reshape changes the shape from (..., C, T) to (..., C*T)
The output of model.summary() then looks like this:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_4 (InputLayer) [(None, 32, 32, 3)] 0
__________________________________________________________________________________________________
tf_op_layer_Pow_6 (TensorFlowOp [(None, 32, 32, 3)] 0 input_4[0][0]
__________________________________________________________________________________________________
tf_op_layer_Pow_7 (TensorFlowOp [(None, 32, 32, 3)] 0 input_4[0][0]
__________________________________________________________________________________________________
tf_op_layer_Mul_6 (TensorFlowOp [(None, 32, 32, 3)] 0 tf_op_layer_Pow_6[0][0]
__________________________________________________________________________________________________
tf_op_layer_Mul_7 (TensorFlowOp [(None, 32, 32, 3)] 0 tf_op_layer_Pow_7[0][0]
__________________________________________________________________________________________________
tf_op_layer_x_3 (TensorFlowOpLa [(2, None, 32, 32, 3 0 tf_op_layer_Mul_6[0][0]
tf_op_layer_Mul_7[0][0]
__________________________________________________________________________________________________
tf_op_layer_Cumsum_3 (TensorFlo [(2, None, 32, 32, 3 0 tf_op_layer_x_3[0][0]
__________________________________________________________________________________________________
tf_op_layer_Transpose_3 (Tensor [(None, 32, 32, 3, 2 0 tf_op_layer_Cumsum_3[0][0]
__________________________________________________________________________________________________
tf_op_layer_Reshape_3 (TensorFl [(None, 32, 32, 6)] 0 tf_op_layer_Transpose_3[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 30, 30, 32) 1760 tf_op_layer_Reshape_3[0][0]
On CIFAR10, this network trains slightly better with expansion - maybe 1% accuracy gain (from 71 to 72%).
Step by step explanation of the code using sample data:
# create a sample input
x = tf.convert_to_tensor([[1,2,3],[4,5,6],[7,8,9]], dtype=tf.float32) # start with H=3, W=3
x = tf.expand_dims(x, axis=0) # add batch dimension N=1
x = tf.expand_dims(x, axis=3) # add channel dimension C=1
# x is now NHWC or (1, 3, 3, 1)
n_terms = 2 # expand to T=2
c = tf.constant([1, -1/6])
p = tf.constant([1, 3], dtype=tf.float32)
terms = []
for i in range(n_terms):
# this simply calculates m = c_i * x ^ p_i
m = c[i] * tf.math.pow(x, p[i])
terms.append(m)
print(terms)
# list of two tensors with shape NHWC or (1, 3, 3, 1)
# calculate each partial sum
expansion = tf.math.cumsum(terms)
print(expansion.shape)
# tensor with shape TNHWC or (2, 1, 3, 3, 1)
# move the T dimension last
expansion_terms_last = tf.transpose(expansion, perm=[1, 2, 3, 4, 0])
print(expansion_terms_last.shape)
# tensor with shape NHWCT or (1, 3, 3, 1, 2)
# stack the last two dimensions together
x = tf.reshape(expansion_terms_last, tf.constant([-1, 3, 3, 1*2]))
print(x.shape)
# tensor with shape NHW and C*T or (1, 3, 3, 2)
# if the input had 3 channels for example, this would be (1, 3, 3, 6)
# now use this as though it was the input
Key assumptions (1) The c_i and p_i are not learned parameters, therefore the "expansion neurons" are not actually neurons, they are just a multiply and sum node (althrough neurons sounds cooler :) and (2) the expansion happens for each input channel independently, thus C input channels expanded to T terms each produce C*T input features, but the T features from each channel are calculated completely independently of the other channels (it looks like that in the diagram), and (3) the input contains all the partial sums (ie c_1 * x ^ p_1, c_1 * x ^ p_1 + c_2 * x ^ p_2 and so forth) but does not contain the terms (again, looks like it in the diagram)
I have an input layer of size 32x32. Then I apply a 2d convolution with stride (4,4) and with 16 filters each having kernel size 4x4. Hence, the resulting shape will be 8 x 8 x 16. Now I want to reshape the result back to the input shape so that the channel dimension will turn back into 4x4 squares in the corresponding places, i.e. if we define the result of the convolution as T and the desired result as D, then I want D[i * 4 + k, j * 4 + l] = T [i , j , k * 8 + l], with i,j = 0,..,7 and k,l = 0,..,3. Is there a way to do this?
import numpy as np
from keras.layers import Input, Conv2D
from keras.initializers import Constant
input = Input(( 32, 32), dtype = 'float32')
filters = np.ndarray((4, 4, 16), dtype=np.float32)
# Initialization of the filter
filter_layer = Conv2D(16, 4, strides =(4,4), kernel_initialzer=Constant(filters), trainable = False)(input)
# no idea how to reshape the filter back
The goal of the model is to categorically classify video inputs by the word articulated with them. Each input has the dimensionality 45 frames, 1 gray color channel, 100 pixel rows, and 150 pixel columns (45, 1, 100, 150), while each corresponding output is a one hot encoded representation of one of 3 possible words (e.g. "yes" => [0, 0, 1]).
During the compilation of the model, the following error occurs:
ValueError: Dimensions must be equal, but are 1 and 3 for 'Conv2D_94' (op: 'Conv2D') with
input shapes: [?,100,150,1], [3,3,3,32].
Here is the script used to train the model:
video = Input(shape=(self.frames_per_sequence,
1,
self.rows,
self.columns))
cnn = InceptionV3(weights="imagenet",
include_top=False)
cnn.trainable = False
encoded_frames = TimeDistributed(cnn)(video)
encoded_vid = LSTM(256)(encoded_frames)
hidden_layer = Dense(output_dim=1024, activation="relu")(encoded_vid)
outputs = Dense(output_dim=class_count, activation="softmax")(hidden_layer)
osr = Model([video], outputs)
optimizer = Nadam(lr=0.002,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08,
schedule_decay=0.004)
osr.compile(loss="categorical_crossentropy",
optimizer=optimizer,
metrics=["categorical_accuracy"])
According to Convolution2D in Keras, the following should be the shape of input and filter.
shape of input = [batch, in_height, in_width, in_channels]
shape of filter = [filter_height, filter_width, in_channels, out_channels]
So, the meaning of the error you are getting -
ValueError: Dimensions must be equal, but are 1 and 3 for 'Conv2D_94' (op: 'Conv2D') with
input shapes: [?,100,150,1], [3,3,3,32].
[?,100,150,1] means in_channels value is 1 whereas [3,3,3,32] means in_channels value is 3. Thats why you are getting the error - Dimensions must be equal, but are 1 and 3.
So you can change the shape of the filter to [3, 3, 1, 32].
I'm using Theano for classification (convolutional neural networks)
Previously, I've been using the pixel values of the (flattened) image as the features of the NN.
Now, I want to add additional features. I've been told that I can concatenate that vector of additional features to the flattened image features and then use that as input to the fully-connected layer, but I'm having trouble with that.
First of all, is that the right approach?
Here's some code snippets and my errors:
Similar to the provided example from their site with some modifications
(from the class that builds the model)
# allocate symbolic variables for the data
self.x = T.matrix('x') # the data is presented as rasterized images
self.y = T.ivector('y') # the labels are presented as 1D vector of [int] labels
self.f = T.matrix('f') # additional features
Below, variables v and rng are defined previously. What's important is layer2_input:
layer2_input = self.layer1.output.flatten(2)
layer2_input = T.concatenate([layer2_input, self.f.flatten(2)])
self.layer2 = HiddenLayer(rng, input=layer2_input, n_in=v, n_out=200, activation=T.tanh)
(from the class that trains)
train_model = theano.function([index], cost, updates=updates,
givens={
model.x: train_set_x[index * batch_size: (index + 1) * batch_size],
model.y: train_set_y[index * batch_size: (index + 1) * batch_size],
model.f: train_set_f[index * batch_size: (index + 1) * batch_size]
})
However, I get an error when the train_model is called:
ValueError: GpuJoin: Wrong inputs for input 1 related to inputs 0.!
Apply node that caused the error: GpuJoin(TensorConstant{0}, GpuElemwise{tanh,no_inplace}.0, GpuFlatten{2}.0)
Inputs shapes: [(), (5, 11776), (5, 2)]
Inputs strides: [(), (11776, 1), (2, 1)]
Inputs types: [TensorType(int8, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Do the input shapes represent the shapes of x, y and f, respectively?
If so, the third seems correct (batchsize=5, 2 extra features), but why is the first a scalar and the second a matrix?
More details:
train_set_x.shape = (61, 19200) [61 flattened images (160x120), 19200 pixels]
train_set_y.shape = (61,) [61 integer labels]
train_set_f.shape = (61,2) [2 additional features per image]
batch_size = 5
Do I have the right idea or is there a better way of accomplishing this?
Any insights into why I'm getting an error?
Issue was that I was concatenating on the wrong axis.
layer2_input = T.concatenate([layer2_input, self.f.flatten(2)])
should have been
layer2_input = T.concatenate([layer2_input, self.f.flatten(2)], axis=1)