I am doing a binary regression problem using keras.
The input shape is: (None, 2, 94, 3) (channels is the last dimension)
I have the following architecture:
input1 = Input(shape=(time, n_rows, n_channels))
masking = Masking(mask_value=-999)(input1)
convlstm = ConvLSTM1D(filters=16, kernel_size=15,
data_format='channels_last',
activation="tanh")(masking)
dropout = Dropout(0.2)(convlstm)
flatten1 = Flatten()(dropout)
outputs = Dense(n_outputs, activation='sigmoid')(flatten1)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss=keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
However when training I get this error: Dimensions must be equal, but are 94 and 80 for '{{node conv_lstm1d/while/SelectV2}} = SelectV2[T=DT_FLOAT](conv_lstm1d/while/Tile, conv_lstm1d/while/mul_5, conv_lstm1d/while/Placeholder_2)' with input shapes: [?,94,16], [?,80,16], [?,80,16].
If I remove the masking layer this error disappears, what is the masking doing that triggers this error? Also the only way I was able to run the above architecture was with a kernel_size of 1.
Seems like the ConvLSTM1D layer needs a mask with the shape (samples, timesteps) according to the docs. The mask you are calculating has the shape (samples, time, rows). Here is one solution to fix your problem but I am not sure if it is the 'correct' way to go:
import tensorflow as tf
input1 = tf.keras.layers.Input(shape=(2, 94, 3))
masking = tf.keras.layers.Masking(mask_value=-999)(input1)
convlstm = tf.keras.layers.ConvLSTM1D(filters=16, kernel_size=15,
data_format='channels_last',
activation="tanh")(inputs = masking, mask = tf.reduce_all(masking._keras_mask, axis=-1))
dropout = tf.keras.layers.Dropout(0.2)(convlstm)
flatten1 = tf.keras.layers.Flatten()(dropout)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(flatten1)
model = tf.keras.Model(inputs=input1, outputs=outputs)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
This line mask = tf.reduce_all(masking._keras_mask, axis=-1) essentially reduces your mask to (samples, timesteps) by applying an AND operation to the last dimension of the mask. Alternatively, you could just create your own custom mask layer:
import tensorflow as tf
class Reduce(tf.keras.layers.Layer):
def __init__(self):
super(Reduce, self).__init__()
def call(self, inputs):
return tf.reduce_all(tf.reduce_any(tf.not_equal(inputs, -999), axis=-1, keepdims=False), axis=1)
input1 = tf.keras.layers.Input(shape=(2, 94, 3))
reduce_layer = Reduce()
boolean_mask = reduce_layer(input1)
convlstm = tf.keras.layers.ConvLSTM1D(filters=16, kernel_size=15,
data_format='channels_last',
activation="tanh")(inputs = input1, mask = boolean_mask)
dropout = tf.keras.layers.Dropout(0.2)(convlstm)
flatten1 = tf.keras.layers.Flatten()(dropout)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(flatten1)
model = tf.keras.Model(inputs=input1, outputs=outputs)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
print(model.summary(expand_nested=True))
x = tf.random.normal((50, 2, 94, 3))
y = tf.random.uniform((50, ), maxval=3, dtype=tf.int32)
model.fit(x, y)
Related
I am trying to use the output of a variational autoencoder to aid in classifying images. I have pre-trainned the autoencoder and am now trying to load the weights in another script to use the weights of the encoder model for prediction. I am having a weird error when calling the encoder that I cannot make sense of. When I try to call the encoder on a sample, I am told that the shapes are incompatible:
ValueError: Input 0 of layer dense is incompatible with the layer: expected axis -1 of input shape to have value 1048576 but received input with shape (256, 8192). This is confusing because I have pre-trained the model fine and have instantiated the model like I did before (I copy/pasted the code). I have based my model on this YouTube tutorial.
I will also paste in my code:
########## Library Imports ##########
import os, sys
import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Conv2D, Input, Flatten, Dense, Lambda, Reshape, Conv2DTranspose
import keras
import keras.backend as K
from keras.models import Model
from PIL import Image
print(tf.version.VERSION)
img_height = 256 #chosen
img_width = 256
num_channels = 1 #grayscale
input_shape = (img_height, img_width, num_channels)
########## Load VAE Weights ##########
vae_path = os.path.join(os.getcwd(), 'vae_training')
checkpoint_path = os.path.join(vae_path, 'cp.ckpt')
print('vae_path listdir\n', os.listdir(vae_path))
#load patches
#patch_locs = sys.argv[1] #path to the patch folders
patch_locs = r'C:\Users\Daniel\Documents\GitHub\endo_git_v2\patches\single_wsi_for_local_parent'
patch_folders = os.listdir(patch_locs)
print(patch_folders)
########## INSTANTIATE MODEL AND LOAD WEIGHTS ##########
#REPARAMETERIZATION TRICK
# Define sampling function to sample from the distribution
# Reparameterize sample based on the process defined by Gunderson and Huang
# into the shape of: mu + sigma squared x eps
#This is to allow gradient descent to allow for gradient estimation accurately.
def sample_z(args):
z_mu, z_sigma = args
z_mu = tf.cast(z_mu, dtype=tf.float32)
z_sigma = tf.cast(z_sigma, dtype=tf.float32)
eps = K.random_normal(shape=(K.shape(z_mu)[0], K.int_shape(z_mu)[1]))
out = z_mu + K.exp(z_sigma / 2) * eps
return out
#Define custom loss
#VAE is trained using two loss functions reconstruction loss and KL divergence
#Let us add a class to define a custom layer with loss
class CustomLayer(keras.layers.Layer):
def vae_loss(self, x, z_decoded):
x = K.flatten(x)
z_decoded = K.flatten(z_decoded)
# Reconstruction loss (as we used sigmoid activation we can use binarycrossentropy)
recon_loss = keras.metrics.binary_crossentropy(x, z_decoded)
recon_loss = tf.cast(recon_loss, dtype=tf.float32)
# KL divergence
kl_loss = -5e-4 * K.mean(1 + z_sigma - K.square(z_mu) - K.exp(z_sigma), axis=-1)
kl_loss = tf.cast(kl_loss, dtype=tf.float32)
return K.mean(recon_loss + kl_loss)
# add custom loss to the class
def call(self, inputs):
x = inputs[0]
z_decoded = inputs[1]
loss = self.vae_loss(x, z_decoded)
self.add_loss(loss, inputs=inputs)
return x
# # ================= #############
# # Encoder
#Let us define 4 conv2D, flatten and then dense
# # ================= ############
latent_dim = 256 # Number of latent dim parameters
input_img = Input(shape=input_shape, name='encoder_input')
print(input_img.shape)
x = Conv2D(32, 3, padding='same', activation='relu')(input_img)
print(x.shape)
x = Conv2D(64, 3, padding='same', activation='relu',strides=(2, 2))(x)
print(x.shape)
x = Conv2D(64, 3, padding='same', activation='relu')(x)
print(x.shape)
x = Conv2D(64, 3, padding='same', activation='relu')(x)
print(x.shape)
conv_shape = K.int_shape(x) #Shape of conv to be provided to decoder (taken after all the conv layers)
print(conv_shape)
#Flatten
x = Flatten()(x)
print(x.shape)
x = Dense(32, activation='relu')(x)
print(x.shape)
# Two outputs, for latent mean and log variance (std. dev.)
#Use these to sample random variables in latent space to which inputs are mapped.
z_mu = Dense(latent_dim, name='latent_mu')(x) #Mean values of encoded input
z_sigma = Dense(latent_dim, name='latent_sigma')(x) #Std dev. (variance) of encoded
z_mu = tf.cast(z_mu, dtype=tf.float32)
z_sigma = tf.cast(z_sigma, dtype=tf.float32)
print('z_mu.dtype:', z_mu.dtype)
print('z_sigma.dtype:', z_sigma.dtype)
# sample vector from the latent distribution
# z is the labda custom layer we are adding for gradient descent calculations
# using mu and variance (sigma)
z = Lambda(sample_z, output_shape=(latent_dim, ), name='z')([z_mu, z_sigma])
print('z.dtype:', z.dtype)
#Z (lambda layer) will be the last layer in the encoder.
# Define and summarize encoder model.
encoder = Model(input_img, [z_mu, z_sigma, z], name='encoder')
print(encoder.summary())
# ================= ###########
# Decoder
#
# ================= #################
# decoder takes the latent vector as input
decoder_input = Input(shape=(latent_dim, ), name='decoder_input')
# Need to start with a shape that can be remapped to original image shape as
#we want our final utput to be same shape original input.
#So, add dense layer with dimensions that can be reshaped to desired output shape
x = Dense(conv_shape[1]*conv_shape[2]*conv_shape[3], activation='relu')(decoder_input)
# reshape to the shape of last conv. layer in the encoder, so we can
x = Reshape((conv_shape[1], conv_shape[2], conv_shape[3]))(x)
# upscale (conv2D transpose) back to original shape
# use Conv2DTranspose to reverse the conv layers defined in the encoder
x = Conv2DTranspose(32, 3, padding='same', activation='relu',strides=(2, 2))(x)
#Can add more conv2DTranspose layers, if desired.
#Using sigmoid activation
x = Conv2DTranspose(num_channels, 3, padding='same', activation='sigmoid', name='decoder_output')(x)
# Define and summarize decoder model
decoder = Model(decoder_input, x, name='decoder')
decoder.summary()
# apply the decoder to the latent sample
z_decoded = decoder(z)
# apply the custom loss to the input images and the decoded latent distribution sample
y = CustomLayer()([input_img, z_decoded])
# y is basically the original image after encoding input img to mu, sigma, z
# and decoding sampled z values.
#This will be used as output for vae
vae = Model(input_img, y, name='vae')
# Compile VAE
vae.compile(optimizer='adam', loss=None, experimental_run_tf_function=False)
vae.summary()
model_weights_dir = r'C:\Users\Daniel\Documents\GitHub\endo_git_v2\vae_training'
checkpoint_path = os.path.join(model_weights_dir, 'cp.ckpt')
print(os.listdir(model_weights_dir))
#vae.load_weights(checkpoint_path)
##################################################################
########## Open all WSI, then Open all Patches ##########
#for wsi in patch_folders: #loops through all the wsi folders
wsi = patch_folders[0]
#start of wsi loop
print('wsi:', wsi)
current_wsi_directory = os.path.join(patch_locs, wsi) #take the current wsi
print('current_wsi_directory:', current_wsi_directory)
patches = os.listdir(current_wsi_directory)
latent_shape = (203, 147, 256)
latent_wsi = np.zeros(latent_shape) #initialized placeholders for latent representations
row = 0
col = 0
for i in range(1):#len(patches)): #should be 29841 every time
#load patch as numpy array
patch_path = os.path.join(current_wsi_directory, '{}_{}.jpeg'.format(wsi, i)) #numerical order not alphabetical
print('patch_path:', patch_path)
image = Image.open(patch_path)
data = np.asarray(image)
#emulate rescale of 1/.255
data = data / 255.
data = np.expand_dims(data, axis=-1)
print('data.shape:', data.shape)
encoder(data, training=False)
Any help or tips are very much appreciated
I solved my issue. Long story short that I'm an idiot. I was passing in a numpy array that was (256,256,1) in size (note that the batch dimension was missing). Reshaping to (1, 256, 256, 1) solved my issue (note that the first 1 is the batch dimension)
I want to concatenate the output from an embedding layer with a custom tensor (myarr / myconst). I can specify everything with a fixed batch size like follows:
import numpy as np
import tensorflow as tf
BATCH_SIZE = 100
myarr = np.ones((10, 5))
myconst = tf.constant(np.tile(myarr, (BATCH_SIZE, 1, 1)))
# Model definition
inputs = tf.keras.layers.Input((10,), batch_size=BATCH_SIZE)
x = tf.keras.layers.Embedding(10, 5)(inputs)
x = tf.keras.layers.Concatenate(axis=1)([x, myconst])
model = tf.keras.models.Model(inputs=inputs, outputs=x)
However, if I don't specify batch size and tile my array, i.e. just the following...
myarr = np.ones((10, 5))
myconst = tf.constant(myarr)
# Model definition
inputs = tf.keras.layers.Input((10,))
x = tf.keras.layers.Embedding(10, 5)(inputs)
x = tf.keras.layers.Concatenate(axis=1)([x, myconst])
model = tf.keras.models.Model(inputs=inputs, outputs=x)
... I get an error specifying that shapes [(None, 10, 5), (10, 5)] can't be concatenated. Is there a way to add this None / batch_size axis to avoid tiling?
Thanks in advance
You want to concatenate to a 3D tensor of shape (batch, 10, 5) a constant of shape (10, 5) along the batch dimensionality. To do this your constant must be 3D. So you have to reshape it in (1, 10, 5) and repeat it along the axis=0 in order to match the shape (batch, 10, 5) and operate a concatenation.
We do this inside a Lambda layer:
X = np.random.randint(0,10, (100,10))
Y = np.random.uniform(0,1, (100,20,5))
myarr = np.ones((1, 10, 5)).astype('float32')
myconst = tf.convert_to_tensor(myarr)
def repeat_const(tensor, myconst):
shapes = tf.shape(tensor)
return tf.repeat(myconst, shapes[0], axis=0)
inputs = tf.keras.layers.Input((10,))
x = tf.keras.layers.Embedding(10, 5)(inputs)
xx = tf.keras.layers.Lambda(lambda x: repeat_const(x, myconst))(x)
x = tf.keras.layers.Concatenate(axis=1)([x, xx])
model = tf.keras.models.Model(inputs=inputs, outputs=x)
model.compile('adam', 'mse')
model.fit(X, Y, epochs=3)
I am trying to build a binary temporal image classifier by combining ResNet18 and an LSTM. However, I have never really used RNNs before and have been struggling on getting the correct output shape.
I am using a batch size of 128 and a sequence size of 32. The images are 80x80 grayscale images.
The current model is:
class CNNLSTM(nn.Module):
def __init__(self):
super(CNNLSTM, self).__init__()
self.resnet = models.resnet18(pretrained=False)
self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3)
self.resnet.fc = nn.Sequential(nn.Linear(in_features=512, out_features=256, bias=True))
self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=3)
self.fc1 = nn.Linear(256, 128)
self.fc2 = nn.Linear(128, 1)
def forward(self, x_3d):
#x3d: torch.Size([128, 32, 1, 80, 80])
hidden = None
toret = []
for t in range(x_3d.size(1)):
x = self.resnet(x_3d[:, t, :, :, :])
out, hidden = self.lstm(x.unsqueeze(0), hidden)
x = self.fc1(out[-1, :, :])
x = F.relu(x)
x = self.fc2(x)
print("x shape: ", x.shape)
toret.append(x)
return torch.stack(toret)
Which returns a tensor of shape torch.Size([32, 128, 1]) which, according to what I understand, means that every nth row represents the nth time step of each element in the sequence.
How can I get output of shape 128x1x32 instead?
And is there a better way to do this?
You could permute the dimensions:
a = torch.rand(32, 128, 1)
a = a.permute(1, 2, 0) # these are the indices of the original dimensions
print(a.shape)
>> torch.Size([128, 1, 32])
But you could also set batch_first=True in the LSTM module:
self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=3, batch_first=True)
This will expect that the input to the LSTM has the shape batch-size x seq-len x features and will output a tensor in the same way.
I have the following code trying to perform predictions on part of resnet model. However, I am retrieving error.
def layer_input_shape(Model, layer_index):
input_shape = np.array(Model.layers[layer_index - 1].output_shape)
input_shape = np.ndarray.tolist(np.delete(input_shape, 0))
return input_shape
def resnet50_Model(Model, trainable=True):
input_shape = layer_input_shape(Model, 1)
input = tf.keras.layers.Input(shape=input_shape)
first_layer = Model.layers[0]
first_layer.trainable = trainable
out = first_layer(input)
for i in range(1, 12):
layer_i = Model.layers[i]
layer_i.trainable = trainable
out = layer_i(out)
out = Conv2D(filters=2, kernel_size=2, strides=(2,2), activation='relu')(out)
out = Flatten()(out)
out = Dense(units=2,activation='softmax')(out)
result_model = tf.keras.models.Model(inputs=[input], outputs=out)
return result_model
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
img='/content/elephant.jpg'
img = image.load_img(img, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = resnet_skip_model.predict(x)
print('Predicted:', decode_predictions(preds, top=3)[0])
Retrieving below error:
ValueError: `decode_predictions` expects a batch of predictions (i.e. a 2D array of shape (samples,
1000)). Found array with shape: (1, 3)
I added two output dense layer so I can only predict two classes and when I call decode it expects 1000 output last dense layer, therefore changed units from two to 1000
out = Dense(units=1000,activation='softmax')(out)
I want to try to implement the neural network architecture of the attached image: 1DCNN_model
Consider that I've got a dataset X which is (N_signals, 1500, 40) where 40 is the number of features where I want to do the 1d convolution on.
My Y is (N_signals, 1500, 2) and I'm working with keras.
Every 1d convolution needs to take one feature vector like in this picture:1DCNN_convolution
So it has to take one chunk of the 1500 timesamples, pass it through the 1d convolutional layer (sliding along time-axis) then feed all the output features to the LSTM layer.
I tried to implement the first convolutional part with this code but I'm not sure what it's doing, I can't understand how it can take in one chunk at a time (maybe I need to preprocess my input data before?):
input_shape = (None, 40)
model_input = Input(input_shape, name = 'input')
layer = model_input
convs = []
for i in range(n_chunks):
conv = Conv1D(filters = 40,
kernel_size = 10,
padding = 'valid',
activation = 'relu')(layer)
conv = BatchNormalization(axis = 2)(conv)
pool = MaxPooling1D(40)(conv)
pool = Dropout(0.3)(pool)
convs.append(pool)
out = Merge(mode = 'concat')(convs)
conv_model = Model(input = layer, output = out)
Any advice? Thank you very much
Thank you very much, I modified my code in this way:
input_shape = (1500,40)
model_input = Input(shape=input_shape, name='input')
layer = model_input
layer = Conv1D(filters=40,
kernel_size=10,
padding='valid',
activation='relu')(layer)
layer = BatchNormalization(axis=2)(layer)
layer = MaxPooling1D(pool_size=40,
padding='same')(layer)
layer = Dropout(self.params.drop_rate)(layer)
layer = LSTM(40, return_sequences=True,
activation=self.params.lstm_activation)(layer)
layer = Dropout(self.params.lstm_dropout)(layer)
layer = Dense(40, activation = 'relu')(layer)
layer = BatchNormalization(axis = 2)(layer)
model_output = TimeDistributed(Dense(2,
activation='sigmoid'))(layer)
I was actually thinking that maybe I have to permute my axes in order to make maxpooling layer work on my 40 mel feature axis...
If you want to perform an individual 1D convolution over the 40 feature channels you should add a dimension to your input:
(1500,40,1)
if you perform 1D convolution on a input with shape
(1500,40)
the filters are applied on the time dimension and the pictures you posted indicate that this is not what you want to do.