My goal is to build a convolutional autoencoder that encodes input image to flat vector of size (10,1). I followed the example from keras documentation and modified it for my purposes. Unfortunetely, the model like this:
input_img = Input(shape=(28, 28, 1))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Flatten()(x)
encoded = Dense(units = 10, activation = 'relu')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
gives me
ValueError: Input 0 is incompatible with layer conv2d_39: expected ndim=4, found ndim=2
I think I should add some layer to my Decoder to inverse the effect of Flatten, but wasn't really sure which one. Can you help?
Why do you want to have specifically (10,1) shape for the vector?
You are trying to then do convolutions on that with kernel of size 3x3, which does not really make sense.
The shape a convolutional layer takes in has height, width and channels. The output of the dense layer has to be reshaped which can be done with Reshape layer.
You can then reshape it to for example 5x2 with single channel.
encoded = Reshape((5, 2, 1))(encoded)
Related
I would like to train a convolutional neural network autoencoder on a csv file. The csv file contains pixel neighborhood position of an original image of 1024x1024.
When I try to train it, I have the following error that I don't manage to resolve.
ValueError: Input 0 of layer max_pooling2d is incompatible with the layer: expected ndim=4, found ndim=5. Full shape received: (None, 1, 1024, 1024, 16). Any idea, what I am doing wrong in my coding?
Let's me explain my code:
My csv file has this structure:
0 0 1.875223e+01
1 0 1.875223e+01
2 0 2.637685e+01
3 0 2.637685e+01
4 0 2.637685e+01
I managed to load my dataset, extract the x, y, and value columns as NumPy arrays and extract the relevant columns as NumPy arrays
x = data[0].values
y = data[1].values
values = data[2].values
Then, I create an empty image with the correct dimensions and fill in the image with the pixel values
image = np.empty((1024, 1024))
for i, (xi, yi, value) in enumerate(zip(x, y, values)):
image[xi.astype(int), yi.astype(int)] = value
To use this array as input to my convolutional autoencoder I reshaped it to a 4D array with dimensions
# Reshape the image array to a 4D tensor
image = image.reshape((1, image.shape[0], image.shape[1], 1))
Finally, I declare the convolutional autoencoder structure, at this stage I have the error `incompatible with the layer: expected ndim=4, found ndim=5. Full shape received: (None, 1, 1024, 1024, 16)'
import keras
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
# Define the input layer
input_layer = Input(shape=(1,image.shape[1], image.shape[2], 1))
# Define the encoder layers
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# Define the decoder layers
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
# Define the autoencoder model
autoencoder = Model(input_layer, decoded)
# Compile the model
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
# Reshape the image array to a 4D tensor
image = image.reshape((1, image.shape[0], image.shape[1], 1))
# Train the model
autoencoder.fit(image, image, epochs=50, batch_size=1, shuffle=True)
You should drop the first dimension in the input layer:
input_layer = Input(shape=(image.shape[1], image.shape[2], 1))
Training pool shape and input layer shape should be different. Input layer shape describes the shape of a single datapoint, while training pool shape describes whole dataset. Hence it has one more dimension with size equal to the number of data points in your dataset.
You should also drop second image.reshape since at that time image.shape == (1, 1024, 1024, 1) and doing image = image.reshape((1, image.shape[0], image.shape[1], 1)) tries to reshape that into (1, 1, 1024, 1) which is impossible.
And lastly you forgot to add padding='same' to one of the Conv2D layers. Add that to match output layer shape with your training label shape.
All auto-encoders that I have seen have the same input-output shapes. However, I need an auto-encoder with inputs in the shape of (None, 32, 16, 3) [RGB images] and outputs in the shape of (None, 16, 16, 6) [A one-hot encoded representation of the image]. I have tried to use the Keras example for the mnist dataset and adapt it for my use case, but I am getting the following error :
ValueError: `logits` and `labels` must have the same shape, received ((None, 32, 16, 6) vs (None, 16, 16, 6)).
Below is my auto-encoder architecture:
input = layers.Input(shape=(32, 16, 3))
# Encoder
x = layers.Conv2D(32, (3, 3), activation="relu", padding="same")(input)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
x = layers.Conv2D(32, (3, 3), activation="relu", padding="same")(x)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
# Decoder
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation="relu", padding="same")(x)
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation="relu", padding="same")(x)
x = layers.Conv2D(6, (3, 3), activation="softmax", padding="same")(x)
# Autoencoder
autoencoder = Model(input, x)
autoencoder.compile(opti
mizer="adam", loss="binary_crossentropy")
autoencoder.summary()
All I care about right now is to make it work without causing errors, and then I will proceed to fix other stuff like the loss function, etc. Is there any way to make this work, or should I just go for other deep learning architectures?
Theoretically, you can build your model exactly as you describe it and end up with a different shape on the output than on the input.
In this case, you only have to take into account that your input data is not also suitable as a target for training. In this case, the target must be data that has the same shape as the output of the network.
According to the error message, this is not the case here. The model has an output shape of (None, 32, 16, 6). However, the target is data with the shape (None, 16, 16, 6).
To solve this, the network or its layer would have to be adjusted so that the two shapes fit to of each other.
With the output of autoencoder.summary() you can see the shape at the end very well.
For example, a possible network that has the right output shape looks like this:
input = layers.Input(shape=(32, 16, 3))
# Encoder
x = layers.Conv2D(32, (3, 3),
activation="relu", padding="same")(input)
x = layers.MaxPooling2D((2, 2), padding="same")(x)
x = layers.Conv2D(32, (3, 3), activation="relu", padding="same")(x)
x = layers.MaxPooling2D((4, 2), padding="same")(x)
# Decoder
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation="relu", padding="same")(x)
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation="relu", padding="same")(x)
x = layers.Conv2D(6, (3, 3), activation="softmax", padding="same")(x)
# Autoencoder
autoencoder = Model(input, x)
autoencoder.summary()
I have build a CNN autoencoder using keras and it worked fine for the MNIST test data set. I am now trying it with a different data set collected from another source. There are pure images and I have to read them in using cv2 which works fine. I then convert these images into a numpy array which again I think works fine. But when I try to do the .fit method it gives me this error.
Error when checking target: expected conv2d_39 to have shape (100, 100, 1) but got array with shape (100, 100, 3)
I tried converting the images to grey scale but they then get the shape (100,100) and not (100,100,1) which is what the model wants. What am I doing wrong here?
Here is the code that I am using:
def read_in_images(path):
images = []
for files in os.listdir(path):
img = cv2.imread(os.path.join(path, files))
if img is not None:
images.append(img)
return images
train_images = read_in_images(train_path)
test_images = read_in_images(test_path)
x_train = np.array(train_images)
x_test = np.array(test_images) # (36, 100, 100, 3)
input_img = Input(shape=(100,100,3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(168, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.fit(x_train, x_train,
epochs=25,
batch_size=128,
shuffle=True,
validation_data=(x_test, x_test),
callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])
The model works fine with the MNIST data set but not with my own data set. Any help will be appreciated.
Your input and output shapes are different. That triggers the error (I think).
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
should be
decoded = Conv2D(num_channels, (3, 3), activation='sigmoid', padding='same')(x)
I ran some tests, and with data loaded in grayscale like that :
img = cv2.imread(os.path.join(path, files), 0)
then expand the dim of the final loaded array like :
x_train = np.expand_dims(x_train, -1)
and finaly normalize you data with a simple :
x_train = x_train / 255.
(the input of your model must be : input_img = Input(shape=(100, 100, 1))
The loss becomes normal again and the model run well !
UPDATE after comment
In order to keep all the rgb channel throught the network, you need an output corresponding to your input shape.
Here if you want image with shape (100, 100, 3), you need an output of (100, 100, 3) from your decoder.
The decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) will shrink the output to have a shape (100, 100, 1)
So you simply need to change the number of filters, here we want 3 colors channels so the conv must be like that :
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
I have images with shape (3600, 3600, 3). I'd like to use an autoencoder on them. My code is:
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K
from keras.preprocessing.image import ImageDataGenerator
input_img = Input(shape=(3600, 3600, 3))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
batch_size=2
datagen = ImageDataGenerator(rescale=1. / 255)
# dimensions of our images.
img_width, img_height = 3600, 3600
train_data_dir = 'train'
validation_data_dir = validation
generator_train = datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
)
generator_valid = datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
autoencoder.fit_generator(generator=generator_train,
validation_data = generator_valid,
)
When I run the code I get this error message:
ValueError: Error when checking target: expected conv2d_21 to have 4 dimensions, but got array with shape (26, 1)
I know the problem is somewhere in the shape of the layers, but I couldn't find it. Can someone please help me and explain the solution?
There are the following issues in your code:
Pass class_mode='input' to flow_from_directory method to give input images as the labels as well (since you are creating an autoencoder).
Pass padding='same' to the third Conv2D layer in the decoder:
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
Use three filers in the last layer since your images are RGB:
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
I want to compare 2 images.
The approach I adopt is to encode them.
The angle between the two encoded vectors is then calculated for similarity measure.
The code below is used to encode and then decode images using CNN with Keras.
However, I need to get the value of the tensor encoded.
How to achieve it?
Thank you very much.
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K
input_img = Input(shape=(28, 28, 1))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
#----------------------------------------------------------------#
# How to get the values of the tensor "encoded"? #
#----------------------------------------------------------------#
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
.....
autoencoder.fit(x_train, x_train,
epochs=50,
batch_size=128,
shuffle=True,
validation_data=(x_test, x_test),
callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])
In order to get an intermediate output, you need to create a separate model that contains the computation graph up to that point. In your case, you can:
encoder = Model(input_img, encoded)
After training with autoencoder is complete, you can encoder.predict which will return you the intermediate encoded result. You can also save the models separately as you would any other model and not have to train every time. In short, a Model is container for layers that construct a computation graph.
If I understand your question correctly you would like to get the 128 dimensional encoded representation, from the convolutional autoencoder, for image comparison?
What you could do is create a reference on the encoder part of the network, train the whole autoencoder and then encode the images with the weights of the encoder reference.
Put this:
self.autoencoder = autoencoder
self.encoder = Model(inputs=self.autoencoder.input, outputs=self.autoencoder.get_layer('encoded').output)
after autoencoder.compile()
and create encodings with:
encoded_img = self.encoder.predict(input)