How do CNN layers add their bias values? - python

I need to write my CNN model as a Theano function with my weights already set by Keras (Tensorflow as the backend), but I am unsure about how to add the bias values associated with each layer.
This solution How can I get a 1D convolution in theano works nicely to write a single layer as a Theano function, but I need to stack my weights with the biases from each layer
Simplified version of my code:
model = Sequential([
InputLayer(batch_input_shape=(None,100,1)),
Convolution1D(nb_filter=16, filter_length=8, activation='relu', border_mode='same', init='he_normal', input_shape=(None,100,1)),
Convolution1D(nb_filter=32, filter_length=8, activation='relu', border_mode='same', init='he_normal'),
MaxPooling1D(pool_length=4),
Flatten(),
Dense(output_dim=32, activation='relu', init='he_normal'),
Dense(output_dim=1, input_dim=32, activation='linear'),
])
How do you add the bias weights to the CNN layer?
For instance, the weights of my first layer have the dimensions: (8, 1, 1, 16)
With a bias with dimensions: (16,)
Which is easy enough to concatenate together to get dimensions: (9, 1, 1, 16)
but for the next layer I have dimensions: (8, 1, 16, 32)
with a bias with dimensions: (32,)
How can I combine this into one weight matrix? To put into the Theano T.signal.conv.conv2d function?

Related

How to correctly configure Encoder Decoder LSTM to have one timestep output carrying multiple features

In each observation, I have 6 timesteps each with 2 features, and I am trying to predict 1 timetsep that has 2 parallel features. More specifically,
The shape of my input data is: (81, 6, 2)
The shape of my output data is: (81, 1, 2)
I wrote the following code to build Encoder-Decoder LSTM:
model.add(LSTM(200, activation='relu', input_shape=(n_input, 2)))
model.add(RepeatVector(1))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(2)))
The network gives me back the shape (1, 1, 2) when I perform a single prediction.
I want to double check if this is correct, and I am not missing anything, because the predicted values are very bad (some are negative and others are very high).
Having a bad prediction is a different issue. The shape you're getting back should correspond to (samples, timesteps, features) => (1, 1, 2) as specified in the Dense(2) layer.

ValueError: Error when checking target: expected dense_13 to have shape (None, 6) but got array with shape (6, 1)

I am training a classification network with training data which has X.shape = (1119, 7) and Y.shape = (1119, 6). Below is my simple Keras network with and output dim of 6 (size of labels). The error which is returned is below the code
hidden_size = 128
model = Sequential()
model.add(Embedding(7, hidden_size))
#model.add(LSTM(128, input_shape=(1,7)))
model.add(LSTM(hidden_size, return_sequences=True))
model.add(LSTM(hidden_size, return_sequences=True))
model.add(Dense(output_dim=6, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=["categorical_accuracy"])
ValueError: Error when checking target: expected dense_13 to have shape (None, 6) but got array with shape (6, 1)
I would perfer not to do this in tensorflow because I am just prototyping yet it is my first run at Keras and am confused about why it cannot take this data. I attempted to reshape the data in a number of ways in which nothing worked. Any advice as to why this isn't work would be greatly appreciated.
You should probably remove the parameter return_sequences=True from your last LSTM layer. When using return_sequences=True, the output of the LSTM layer has shape (seq_len, hidden_size). Passing this on to a Dense layer gives you an output shape of (seq_len, 6), which is incompatible with your labels. If you instead omit return_sequences=True, then your LSTM layer returns shape (hidden_size,) (it only returns the last element of the sequence) and subsequently your final Dense layer will have output shape (6,) like your labels.

Keras in Python: LSTM Dimensions

I am building an LSTM network.
My data looks as following:
X_train.shape = (134, 300000, 4)
X_train contains 134 sequences, with 300000 timesteps and 4 features.
Y_train.shape = (134, 2)
Y_train contains 134 labels, [1, 0] for True and [0, 1] for False.
Below is my model in Keras.
model = Sequential()
model.add(LSTM(4, input_shape=(300000, 4), return_sequences=True))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Whenever I run the model, I get the following error:
Error when checking target: expected lstm_52 to have 3 dimensions, but got array with shape (113, 2)
It seems to be related to my Y_train data -- as its shape is (113, 2).
Thank you!
The output shape of your LSTM layer is (batch_size, 300000, 4) (because of return_sequences=True). Therefore your model expects the target y_train to have 3 dimensions but you are passing an array with only 2 dimensions (batch_size, 2).
You probably want to use return_sequences=False instead. In this case the output shape of the LSTM layer will be (batch_size, 4). Moreover, you should add a final softmax layer to your model in order to have the desired output shape of (batch_size, 2):
model = Sequential()
model.add(LSTM(4, input_shape=(300000, 4), return_sequences=False))
model.add(Dense(2, activation='softmax')) # 2 neurons because you have 2 classes
model.compile(loss='categorical_crossentropy', optimizer='adam')

understanding shapes for Keras model

I am trying to wrap my head around the shape needed for my specific task. I am attempting to train a qlearner on some time series data which is contained in a dataframe. My dataframe has the following columns: open, close, high, low and I am trying to get a sliding window of say 50x timesteps. Here is example code for each window:
window = df.iloc[0:50]
df_norm = (window - window.mean()) / (window.max() - window.min())
x = df_norm.values
x = np.expand_dims(x, axis=0)
print x.shape
#(1,50, 4)
Now that I know my shape is (1,50,4) for each item in X I'm at a loss for what shape I feed my model. Lets say I have the following:
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(50,4)))
model.add(LSTM(32, return_sequences=True))
model.add(Dense(num_actions))
Gives the following error
ValueError: could not broadcast input array from shape (50,4) into shape (1,50)
And here is another attempt:
model = Sequential()
model.add(Dense(hidden_size, input_shape=(50,4), activation='relu'))
model.add(Dense(hidden_size, activation='relu'))
model.add(Dense(num_actions))
model.compile(sgd(lr=.2), "mse")
which gives the following error:
ValueError: could not broadcast input array from shape (50,4) into shape (1,50))
Here is the shape the model is expecting and the state from my env:
print "Inputs: {}".format(model.input_shape)
print "actual: {}".format(env.state.shape)
#Inputs: (None, 50, 4)
#actual: (1, 50, 4)
Can someone explain where I am going wrong with the shapes here?
The recurrent layer takes inputs of shape (batch_size, timesteps, input_features). Since the shape of x is (1, 50, 4), the data should be interpreted as a single batch of 50 timesteps, each containing 4 features. When initializing the first layer of a model, you pass an input_shape: a tuple specifying the shape of the input, excluding the batch_size dimension. In the case of LSTM layers, you can pass None as the timesteps dimension. Hence, this is how the first layer of the network should be initialized:
model.add(LSTM(32, return_sequences=True, input_shape=(None, 4)))
The second LSTM layer is followed by a dense layer. So you don't need to return sequences for this layer. Hence, this is how you should initialize the second LSTM layer:
model.add(LSTM(32))
Every batch of 50 time steps in x is supposed to be mapped to a single action vector in y. Therefore, since the shape of x is (1, 50, 4), the shape of y must be (1, num_actions). Make sure y doesn't have the timesteps dimension.
Therefore, under the assumption that x and y have the right shapes, the following code should work:
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(None, 4)))
model.add(LSTM(32))
model.add(Dense(num_actions))
model.compile(sgd(lr=.2), "mse")
# x.shape == (1, 50, 4)
# y.shape == (1, num_actions)
history = model.fit(x, y)

Softmax layer in Keras returns a vector of 1s

I want to build a CNN in Keras with a softmax layer as an output, but I only get this as an output:
[[[[ 1.]
[ 1.]
[ 1.]]]]
My model is built like this:
model = Sequential()
model.add(Conv2D(2, (1,3), padding='valid',
input_shape=(3,3,50), init='normal', data_format='channels_first'))
model.add(Activation('relu'))
model.add(Conv2D(20, (1,48), init='normal', data_format='channels_first'))
model.add(Activation('relu'))
model.add(Conv2D(1, (1, 1), init='normal', data_format='channels_first', activation='softmax'))
I don't really get, why softmax does not work. Is it maybe because of a wrong input shape?
The softmax activation will be applied to the last axis.
Looking at your model.summary(), your output shape is (None, 3, 3, 1).
Having only one element at the last axis, your softmax output will always be 1 indeed.
You must select which axis you want to sum 1, and then reshape the output properly. For instance, if you want the softmax to consider the 3 channels, you need to move these channels to the final position:
#your last convolutional layer, without the activation:
model.add(Conv2D(3, (1, 1), kernel_initializer='normal', data_format='channels_first'))
#a permute layer to move the channels to the last position:
model.add(Permute((2,3,1)))
#the softmax, now considering that channels sum 1.
model.add(Activation('softmax'))
But if your purpose is that the entire result sums 1, then you should add a Flatten() instead of a Permute().
Keras seems to be more suited to working with channels_last. In this case, the softmax would automatically be applied to channels, without extra work needed.
You have problem with your model architecture. If you look at some of the popular models:
https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py
You will find out that the structure for a CNN usually looks like this:
Some convolutional layers with relu activation
A pooling layer (i.e. max pooling, mean pooling, etc.)
Flat the convolution layer
A dense layer
Then, a softmax
It does not make sense to apply a softmax to a convolutional layer.

Categories