How do I correctly use Keras Embedding layer? - python

I have written the following multi-input Keras TensorFlow model:
CHARPROTLEN = 25 #size of vocab
CHARCANSMILEN = 62 #size of vocab
protein_input = Input(shape=(train_protein.shape[1:]))
compound_input = Input(shape=(train_smile.shape[1:]))
#protein layers
x = Embedding(input_dim=CHARPROTLEN+1,output_dim=128, input_length=maximum_amino_acid_sequence_length) (protein_input)
x = Conv1D(filters=32, padding="valid", activation="relu", strides=1, kernel_size=4)(x)
x = Conv1D(filters=64, padding="valid", activation="relu", strides=1, kernel_size=8)(x)
x = Conv1D(filters=96, padding="valid", activation="relu", strides=1, kernel_size=12)(x)
final_protein = GlobalMaxPooling1D()(x)
#compound layers
y = Embedding(input_dim=CHARCANSMISET+1,output_dim=128, input_length=maximum_SMILES_length) (compound_input)
y = Conv1D(filters=32, padding="valid", activation="relu", strides=1, kernel_size=4)(y)
y = Conv1D(filters=64, padding="valid", activation="relu", strides=1, kernel_size=6)(y)
y = Conv1D(filters=96, padding="valid", activation="relu", strides=1, kernel_size=8)(y)
final_compound = GlobalMaxPooling1D()(y)
join = tf.keras.layers.concatenate([final_protein, final_compound], axis=-1)
x = Dense(1024, activation="relu")(join)
x = Dropout(0.1)(x)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(1,kernel_initializer='normal')(x)
model = Model(inputs=[protein_input, compound_input], outputs=[predictions])
The inputs have the following shapes:
train_protein.shape
TensorShape([5411, 1500, 1])
train_smile.shape
TensorShape([5411, 100, 1])
I get the following error message:
ValueError: One of the dimensions in the output is <= 0 due to downsampling in conv1d. Consider increasing the input size. Received input shape [None, 1500, 1, 128] which would produce output shape with a zero or negative value in a dimension.
Is this due to the Embedding layer having the incorrect output_dim? How do I correct this? Thanks.

A Conv1D layer requires the input shape (batch_size, timesteps, features), which train_protein and train_smile already have. For example, train_protein consists of 5411 samples, where each sample has 1500 timesteps, and each timestep one feature. Applying an Embedding layer to them results in adding an additional dimension, which Conv1D layers cannot work with.
You have two options. You either leave out the Embedding layer altogether and feed your inputs directly to the Conv1D layers, or you reshape your data to be (5411, 1500) for train_protein and (5411, 100) for train_smile. You can use tf.reshape, tf.squeeze, or tf.keras.layers.Reshape to reshape the data. Afterwards you can use the Embedding layer as planned. And note that output_dim determines the n-dimensional vector to which each timestep will be mapped. See also this and this.

Related

Permute Layer: Negative dimension size caused by subtracting 3 from 2

I have two sensor inputs for which I have applied the Concatenate layer previously for fusion. Both of them are time series data for which I'm now trying to apply a permutation layer. However, when I do so, I get the error:
Negative dimension size caused by subtracting 3 from 2 for '{{node conv1d_334/conv1d}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](conv1d_334/conv1d/ExpandDims, conv1d_334/conv1d/ExpandDims_1)' with input shapes: [?,1,2,6249], [1,3,6249,128].
My inputs are both time series data with input dimension (1176, 6249, 1). Can anybody tell me what I'm doing wrong? Here is a sample code:
lr = 0.0005
n_timesteps = 3750
n_features = 1
n_outputs = 3
def small_model(optimizer='rmsprop', init='glorot_uniform'):
signal1 = Input(shape=(X_train.shape[1:]))
signal2 = Input(shape=(X_train_phase.shape[1:]))
concat_signal = Concatenate()([signal1, signal2])
# x = InputLayer(input_shape=(None, X_train.shape[1:][0],1))(inputA)
x = Permute(dims=(2, 1))(concat_signal)
x = BatchNormalization()(x)
x = Conv1D(64, 5, activation='relu', kernel_initializer='glorot_normal')(x) #, input_shape=(None, 3750, n_features)
x = Conv1D(64, 5, activation='relu', kernel_initializer='glorot_normal')(x)
x = MaxPooling1D(5)(x)
x = Dropout(0.3)(x)
Your problem is that when you get to the convolution, your time dimension (2) is smaller than the filter that you have specified (5).
import tensorflow as tf
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Permute
from tensorflow.keras.layers import Concatenate
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv1D
# dummy data w/batch 32
X_train = tf.random.normal([32, 6249, 1])
X_train_phase = tf.random.normal([32, 6249, 1])
signal1 = Input(shape=(X_train.shape[1:]))
signal2 = Input(shape=(X_train_phase.shape[1:]))
concat_signal = Concatenate()([signal1, signal2])
x = Permute(dims=(2, 1))(concat_signal)
x = BatchNormalization()(x)
print(x.shape)
# (None, 2, 6249)
If you see the docs for tf.keras.layers.Conv1D, you'll notice that "valid" is the default padding, which means there is no padding. There is a great reference, "A guide to convolution arithmetic for deep
learning", which does a good job of illustrating the relationship between input size, kernel size, strides, and padding.
While I am not sure what you're trying to accomplish with this network, adding the argument padding="same" to your convolution layers will send the input through without issue.
x = Conv1D(
filters=64,
kernel_size=5,
activation="relu",
padding="same", # <= add this.
kernel_initializer="glorot_normal")(x)

Adding padding='same' argument to Conv2D layers broke the model

I created this model
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D, Input, Dense
from tensorflow.keras.layers import Reshape, Flatten
from tensorflow.keras import Model
def create_DeepCAPCHA(input_shape=(28,28,1),n_prediction=1,n_class=10,optimizer='adam',
show_summary=True):
inputs = Input(input_shape)
x = Conv2D(filters=32, kernel_size=3, activation='relu', padding='same')(inputs)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(filters=48, kernel_size=3, activation='relu', padding='same')(x)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(filters=64, kernel_size=3, activation='relu', padding='same')(x)
x = MaxPooling2D(pool_size=2)(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dense(units=n_prediction*n_class, activation='softmax')(x)
outputs = Reshape((n_prediction,n_class))(x)
model = Model(inputs, outputs)
model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics= ['accuracy'])
if show_summary:
model.summary()
return model
I tried the model on MNIST dataset
import tensorflow as tf
import numpy as np
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
inputs = x_train
outputs = tf.keras.utils.to_categorical(y_train, num_classes=10)
outputs = np.expand_dims(outputs,1)
model = create_DeepCAPCHA(input_shape=(28,28,1),n_prediction=1,n_class=10)
model.fit(inputs, outputs, epochs=10, validation_split=0.1)
but it failed to converge (stuck at 10% accuracy => same as random guessing). Yet when I remove the "padding='same'" argument from Conv2D layers, it works flawlessly:
def working_DeepCAPCHA(input_shape=(28,28,1),n_prediction=1,n_class=10,optimizer='adam',
show_summary=True):
inputs = Input(input_shape)
x = Conv2D(filters=32, kernel_size=3, activation='relu')(inputs)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(filters=48, kernel_size=3, activation='relu')(x)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(filters=64, kernel_size=3, activation='relu')(x)
x = MaxPooling2D(pool_size=2)(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dense(units=n_prediction*n_class, activation='softmax')(x)
outputs = Reshape((n_prediction,n_class))(x)
model = Model(inputs, outputs)
model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics= ['accuracy'])
if show_summary:
model.summary()
return model
Anyone has any idea what problem this is?
Thank you for sharing, it was really interesting to me. So I wrote the code and tested several scenarios. Note that what I'm going to say is just my guest and I'm not sure about it.
My conclusion from those tests is that no padding or valid padding works because it produces (1, 1, 64) output shape for the last conv layer. But if you set the padding to same it will produce (3, 3, 64) and because the next layer is a big Dense layer, it will multiply the number of networks parameters by 9 (I expected to somehow result in overfitting) and it seems to make it much harder for the network to find the good values of the parameters. So I tried some different ways to reduce the output of last conv layer to (1, 1, 64) as below:
using one more conv layer + maxpooling
change the last maxpooling to pool_size of 4
using stride of 2 for one of conv layers
change the filters of last conv layer to 20
and they all worked well. even changing the dense units from 512 to 64 will help as well (note that even now you may get poor results with a little chance, because of bad initialization I guess).
Then I changed the shape of the last conv layer to (2, 2, 64) and the chance to get a good result (more than 90% accuracy) reduced (alot of time I've got 10% accuracy).
So it seems that a lot of parameters can confuse the model. But if you want to know why the network does not overfit, I have no answer for you.

Tensorflow/Keras: tf.reshape to Concatenate after multiple Conv2D

I am implementing multiple Conv2D layers, then I concatenate the outputs.
x = Conv2D(f, kernel_size=(3,3), strides=(1,1))(input)
y = Conv2D(f, kernel_size=(5,5), strides=(2,2))(input)
output = Concatenate()([x, y])
As you know, different kernel size produces different output shape. Although I can do this:
x = Conv2D(f, kernel_size=(3,3), strides=(1,1), padding="same")(input)
y = Conv2D(f, kernel_size=(5,5), strides=(2,2), padding="same")(input)
output = Concatenate()([x, y])
But that would increase the number of channels a lot, which makes me run out of memory. I can also calculate the output shape, but that would be inconvenient if I change the kernel size.
I tried:
y = tf.reshape(y, x.shape)
But I gave the error:
ValueError: Cannot convert a partially known TensorShape to a Tensor
Is there an easy way to concatenate the outputs from multiple Conv2D layers?
You cannot concatenate outputs of two layers if their shapes don't match. You can utilize the ZeroPadding2D layer to add rows and columns with 0 values in order to match the shapes of outputs.
Here is the shortest example along with the shapes.
Code:
from tensorflow.keras.layers import *
from tensorflow.keras import *
import tensorflow as tf
input = Input(shape = (28,28,3))
x = Conv2D(3, kernel_size=(3,3), strides=(1,1))(input)
y = Conv2D(3, kernel_size=(5,5), strides=(2,2))(input)
z = tf.keras.layers.ZeroPadding2D(padding=(7,7))(y)
output = Concatenate()([x, z])
model = Model(inputs = input, outputs = output)
tf.keras.utils.plot_model(model, 'my_first_model.png', show_shapes=True)
Output:
For this example, I have taken input shape as (28,28,3). You can add your own input shape and accordingly change the number of padding rows and columns to be added.
You can take a look at the documentation of ZeroPadding2D here

How to convert a tensorflow model to a pytorch model?

I'm new to pytorch. Here's an architecture of a tensorflow model and I'd like to convert it into a pytorch model.
I have done most of the codes but am confused about a few places.
1) In tensorflow, the Conv2D function takes filter as an input. However, in pytorch, the function takes the size of input channels and output channels as inputs. So how do I find the equivalent number of input channels and output channels, provided with the size of the filter.
2) In tensorflow, the dense layer has a parameter called 'nodes'. However, in pytorch, the same layer has 2 different inputs (the size of the input parameters and size of the targeted parameters), how do I determine them based on the number of the nodes.
Here's the tensorflow code.
from keras.utils import to_categorical
from keras.models import Sequential, load_model
from keras.layers import Conv2D, MaxPool2D, Dense, Flatten, Dropout
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu', input_shape=X_train.shape[1:]))
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(43, activation='softmax'))
Here's my code.:
import torch.nn.functional as F
import torch
# The network should inherit from the nn.Module
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# Define 2D convolution layers
# 3: input channels, 32: output channels, 5: kernel size, 1: stride
self.conv1 = nn.Conv2d(3, 32, 5, 1) # The size of input channel is 3 because all images are coloured
self.conv2 = nn.Conv2d(32, 64, 5, 1)
self.conv3 = nn.Conv2d(64, 128, 3, 1)
self.conv3 = nn.Conv2d(128, 256, 3, 1)
# It will 'filter' out some of the input by the probability(assign zero)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
# Fully connected layer: input size, output size
self.fc1 = nn.Linear(36864, 128)
self.fc2 = nn.Linear(128, 10)
# forward() link all layers together,
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = self.conv3(x)
x = F.relu(x)
x = self.conv4(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
Thanks in advance!
1) In pytorch, we take input channels and output channels as an input. In your first layer, the input channels will be the number of color channels in your image. After that it's always going to be the same as the output channels from your previous layer (output channels are specified by the filters parameter in Tensorflow).
2). Pytorch is slightly annoying in the fact that when flattening your conv outputs you'll have to calculate the shape yourself. You can either use an equation to calculate this (𝑂𝑢𝑡=(𝑊−𝐹+2𝑃)/𝑆+1), or make a shape calculating function to get the shape of a dummy image after it's been passed through the conv part of the network. This parameter will be your size of input argument; the size of your output argument will just be the number of nodes you want in your next fully connected layer.

How can I limit regression output between 0 to 1 in keras

I am trying to detect the single pixel location of a single object in an image. I have a keras CNN regression network with my image tensor as the input, and a 3 item vector as the output.
First item: Is a 1 (if an object was found) or 0 (no object was found)
Second item: Is a number between 0 and 1 which indicates how far along the x axis is the object
Third item: Is a number between 0 and 1 which indicates how far along the y axis is the object
I have trained the network on 2000 test images and 500 validation images, and the val_loss is far less than 1, and the val_acc is best at around 0.94. Excellent.
But then when I predict the output, I find the values for all three output items are not between 0 and 1, they are actually between -2 and 3 approximately. All three items should be between 0 and 1.
I have not used any non-linear activation functions on the output layer, and have used relus for all non-output layers. Should I be using a softmax, even though it is non-linear? The second and third items are predicting the x and y axis of the image, which appear to me as linear quantities.
Here is my keras network:
inputs = Input((256, 256, 1))
base_kernels = 64
# 256
conv1 = Conv2D(base_kernels, 3, activation='relu', padding='same', kernel_initializer='he_normal')(inputs)
conv1 = BatchNormalization()(conv1)
conv1 = Conv2D(base_kernels, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv1)
conv1 = BatchNormalization()(conv1)
conv1 = Dropout(0.2)(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
# 128
conv2 = Conv2D(base_kernels * 2, 3, activation='relu', padding='same', kernel_initializer='he_normal')(pool1)
conv2 = BatchNormalization()(conv2)
conv2 = Conv2D(base_kernels * 2, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv2)
conv2 = BatchNormalization()(conv2)
conv2 = Dropout(0.2)(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
# 64
conv3 = Conv2D(base_kernels * 4, 3, activation='relu', padding='same', kernel_initializer='he_normal')(pool2)
conv3 = BatchNormalization()(conv3)
conv3 = Conv2D(base_kernels * 4, 3, activation='relu', padding='same', kernel_initializer='he_normal')(conv3)
conv3 = BatchNormalization()(conv3)
conv3 = Dropout(0.2)(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
flat = Flatten()(pool3)
dense = Dense(256, activation='relu')(flat)
output = Dense(3)(dense)
model = Model(inputs=[inputs], outputs=[output])
optimizer = Adam(lr=1e-4)
model.compile(optimizer=optimizer, loss='mean_absolute_error', metrics=['accuracy'])
Can anyone please help? Thanks! :)
Chris
The sigmoid activation produces outputs between zero and one, so if you use it as activation of your last layer(the output), the network's output will be between zero and one.
output = Dense(3, activation="sigmoid")(dense)

Categories