I have to realize a project about images reconstruction. I have an 1024x1024 image which represents a place before a natural disaster, and I got the same image but post disaster. The goal is to reconstruct the post disaster image using an auto encoder.
First I've split my 1024x1024 into several tile with a size of 16x16. Then I created my layers for the auto encoder.
I guess I need to fit my model for every tiles to calculate the weights, but can I? Like create a for loop and fit my model for every tiles.
And when I realize the prediction, it's wrong and the same result for each tile.
I have no clue how to do that and I'm new in image processing and deep learning so I need help.
I share some piece of code here. Thanks!
My auto encoder layers below:
input_layer = layers.Input(shape= (16,16,1))
encoded = layers.Conv2D(64, (4,4), activation = 'relu' , padding = 'same')(input_layer)
encoded = layers.MaxPooling2D((2,2), padding='same')(encoded)
encoded_1 = layers.Conv2D(32, (4,4), activation = 'relu' , padding = 'same')(encoded)
encoded_1 = layers.MaxPooling2D((2,2), padding='same')(encoded_1)
encoded_2 = layers.Conv2D(16, (4,4), activation = 'relu' , padding = 'same')(encoded_1)
encoded_2 = layers.MaxPooling2D((2,2), padding='same')(encoded_2)
encoded_3 = layers.Conv2D(8, (4,4), activation = 'relu' , padding = 'same')(encoded_2)
encoded_3 = layers.MaxPooling2D((2,2), padding='same')(encoded_3)
encoding_layer = layers.Conv2D(4,(4,4), activation = 'relu', padding='same')(encoded_3)
decoded = layers.Conv2D(8, (4,4), activation='relu', padding='same')(encoding_layer)
decoded = layers.UpSampling2D((2,2))(decoded)
decoded_1 = layers.Conv2D(16,(4,4), activation = 'relu', padding='same')(decoded)
decoded_1 = layers.UpSampling2D((2,2))(decoded_1)
decoded_2 = layers.Conv2D(32, (4,4), activation='relu', padding='same')(decoded_1)
decoded_2 = layers.UpSampling2D((2,2))(decoded_2)
decoded_3 = layers.Conv2D(64,(4,4), activation = 'relu', padding='same')(decoded_2)
decoded_3 = layers.UpSampling2D((2,2))(decoded_3)
decoding_layer = layers.Conv2D(1,(4,4), activation='sigmoid', padding='same')(decoded_3)
autoencoder = models.Model(input_layer, decoding_layer)
my prediction block below:
number_epochs = 25
autoencoder.compile('adam', loss='binary_crossentropy')
for i in range(0,64):
for j in range(0,64):
history = autoencoder.fit(x = bands_split_train[i,j], y = bands_split_test[i,j], batch_size = 256, epochs=number_epochs,shuffle=True)
the bands_split_train variable is an array with a shape of (64,64,16,16,1)
And Below it's how I predict 1 tile :
predicted_tiles = autoencoder.predict(bands_split_train[0,0])
With that I have the same result for every single tile in the image
Related
I have a task for my project paper and I do not get how to train the model. This model is supposed to take an image and segment it into different classes. The hard part is that the different segmentation is the same but I would like to differentiate between them. When I try to make a model with convolutional layers and LSTM, The model only predicts the class of the background.
Here is my model:
def LSTMconv10x9(input_size = (200, 9, 10, 1)):
input = Input(input_size)
conv1 = TimeDistributed(Conv2D(32, 3, padding = "same", activation='relu'))(input)
conv2 = TimeDistributed(Conv2D(64, 3, padding = "same", activation='relu'))(conv1)
lstm = ConvLSTM2D(32, 3, return_sequences=True, padding="same", activation="softmax")(conv2)
conv4 = TimeDistributed(Conv2D(64,3, padding = 'same', activation='relu'))(lstm)
conv5 = TimeDistributed(Conv2D(32,3, padding = 'same', activation='relu'))(conv4)
output = ConvLSTM2D(11,1, return_sequences = True, padding = "same", activation = None)(conv5)
model = Model(inputs = input, outputs = output)
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer = tf.keras.optimizers.Adam(),
metrics=["accuracy"], sample_weight_mode='temporal')
And the way I train the model:
weights = np.where(train_y == 0, 0.1, 0.9)
model1 = LSTMconv10x9simple()
model1.fit(train_x,train_y,epochs=20, batch_size=32,validation_data=(test_x, test_y),sample_weight=weights)
The training set size is (2000,200,9,10,1) and the validation set is (1000,200,9,10,1), where I have 2000 videos of 200 frames in the trainingset, the videos are of 10 structures that look the same but I would to numerate them in a way as different structures. This is a segmentation problem.
The data is very unbalanced as there are objects in each video that I want to separate, but the background is about 90% of the videos. I have tried initializing weights with the "sample_weight_mode='temporal'" method in TensorFlow, but it did not seem to work. The most important thing in the model is to find the structures.
Does anyone have any solutions to my problems?
I'm trying to create an A3C to play a game using frames as an input.
I think my A3C could benefit from having a form of memory layer like a lstm layer.
From what I understand of how the lstm works, you have to give it data by batch and the memory is only going to work on what's given in the batch.
Unfortunately, it's not possible for me to give the whole replay in the batch as the batch size would be way too big. So I wanted to know if it was possible to create a memory layer that would similar to how a lstm layer works. What I had in mind would generate some values based on the output of a layer from the neural network and decide if it's worth saving the values or keep the previous ones and then this memory layer would be fed to the next layer of the neural network.
S = Input(shape = (self.IMAGE_ROWS, self.IMAGE_COLS, self.IMAGE_CHANNELS, ), name = 'Input')
h0 = Convolution2D(1, kernel_size = (8,8), strides = (4,4), activation = 'relu', kernel_initializer = 'random_uniform', bias_initializer = 'random_uniform')(S)
h1 = Convolution2D(1, kernel_size = (4,4), strides = (2,2), activation = 'relu', kernel_initializer = 'random_uniform', bias_initializer = 'random_uniform')(h0)
h2 = Flatten()(h1)
h3 = Dense(256, activation = 'relu', kernel_initializer = 'random_uniform', bias_initializer = 'random_uniform') (h2)
h3 = Dropout(0.5)(h3)
# I was thinking of adding the memory layer here
# It would take the values of h3 and the output would feed h4_k with the values of h3
h4_k = Dense(256, activation = 'relu', kernel_initializer = 'random_uniform', bias_initializer = 'random_uniform') (h3)
h4_k = Dropout(0.5)(h4_k)
h5_k = Dense(256, activation = 'relu', kernel_initializer = 'random_uniform', bias_initializer = 'random_uniform') (h4_k)
h5_k = Dropout(0.5)(h5_k)
probs_k = Dense(self.n_actions_k, activation = 'softmax')(h5_k)
values_k = Dense(1, activation = 'linear')(h5_k)
Does this kind of layer already exist? If not how can I create a custom layer in tensorflow with the capacity to choose if it should update it's values or not?
I'm new to pytorch. Here's an architecture of a tensorflow model and I'd like to convert it into a pytorch model.
I have done most of the codes but am confused about a few places.
1) In tensorflow, the Conv2D function takes filter as an input. However, in pytorch, the function takes the size of input channels and output channels as inputs. So how do I find the equivalent number of input channels and output channels, provided with the size of the filter.
2) In tensorflow, the dense layer has a parameter called 'nodes'. However, in pytorch, the same layer has 2 different inputs (the size of the input parameters and size of the targeted parameters), how do I determine them based on the number of the nodes.
Here's the tensorflow code.
from keras.utils import to_categorical
from keras.models import Sequential, load_model
from keras.layers import Conv2D, MaxPool2D, Dense, Flatten, Dropout
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu', input_shape=X_train.shape[1:]))
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(43, activation='softmax'))
Here's my code.:
import torch.nn.functional as F
import torch
# The network should inherit from the nn.Module
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# Define 2D convolution layers
# 3: input channels, 32: output channels, 5: kernel size, 1: stride
self.conv1 = nn.Conv2d(3, 32, 5, 1) # The size of input channel is 3 because all images are coloured
self.conv2 = nn.Conv2d(32, 64, 5, 1)
self.conv3 = nn.Conv2d(64, 128, 3, 1)
self.conv3 = nn.Conv2d(128, 256, 3, 1)
# It will 'filter' out some of the input by the probability(assign zero)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
# Fully connected layer: input size, output size
self.fc1 = nn.Linear(36864, 128)
self.fc2 = nn.Linear(128, 10)
# forward() link all layers together,
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = self.conv3(x)
x = F.relu(x)
x = self.conv4(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
Thanks in advance!
1) In pytorch, we take input channels and output channels as an input. In your first layer, the input channels will be the number of color channels in your image. After that it's always going to be the same as the output channels from your previous layer (output channels are specified by the filters parameter in Tensorflow).
2). Pytorch is slightly annoying in the fact that when flattening your conv outputs you'll have to calculate the shape yourself. You can either use an equation to calculate this (𝑂𝑢𝑡=(𝑊−𝐹+2𝑃)/𝑆+1), or make a shape calculating function to get the shape of a dummy image after it's been passed through the conv part of the network. This parameter will be your size of input argument; the size of your output argument will just be the number of nodes you want in your next fully connected layer.
I'm just a beginner in this field.
I want to apply Laplacian filter to the image that is imported from Keras.
What should I do? If you have a good document, please share it.
model = Sequential([
layers.Conv2D(32,(3,3), activation='relu', padding = 'same', input_shape=(28,28,1)),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64,(3,3), activation='relu', padding = 'same'),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64,(3,3), activation='relu', padding = 'same'),
layers.Flatten(),
Dense(64,activation = 'relu'),
Dense(10,activation='softmax')
])
model.summary()
input2 = Input(shape=(28,28,1))
Cv2D1=model.layers[0](input2)
MP1=model.layers[1](Cv2D1)
Cv2D2=model.layers[2](MP1)
MP2=model.layers[3](Cv2D2)
Cv2D3=model.layers[4](MP2)
flt=model.layers[5](Cv2D3)
D1=model.layers[6](flt)
D2=model.layers[7](D1)
model1 = Model(input2,D2)
This is part of the code I have.
I am new to tensorflow and Semantic segmentation.
I am designing a U-Net for semantic segmentaion. Each image has one object that I want to classify. But in total I have images of 10 different objects. I am confused, how can I prepare my mask input? Is it considered as multi-label segmentation or only for one class?
Should I convert my input to one hot encoded? Should I use to_categorical? I find exaples for multi-class segmentation, but I don't know, If that's the case here. Because in one image I only have one object to detect/classify.
I tried using this as my code for input. But I am not sure, what I am doing is right or not.
#Generation of batches of image and mask
class DataGen(keras.utils.Sequence):
def __init__(self, image_names, path, batch_size, image_size=128):
self.image_names = image_names
self.path = path
self.batch_size = batch_size
self.image_size = image_size
def __load__(self, image_name):
# Path
image_path = os.path.join(self.path, "images/aug_test", image_name) + ".png"
mask_path = os.path.join(self.path, "masks/aug_test",image_name) + ".png"
# Reading Image
image = cv2.imread(image_path, 1)
image = cv2.resize(image, (self.image_size, self.image_size))
# Reading Mask
mask = cv2.imread(mask_path, -1)
mask = cv2.resize(mask, (self.image_size, self.image_size))
## Normalizaing
image = image/255.0
mask = mask/255.0
return image, mask
def __getitem__(self, index):
if(index+1)*self.batch_size > len(self.image_names):
self.batch_size = len(self.image_names) - index*self.batch_size
image_batch = self.image_names[index*self.batch_size : (index+1)*self.batch_size]
image = []
mask = []
for image_name in image_batch:
_img, _mask = self.__load__(image_name)
image.append(_img)
mask.append(_mask)
#This is where I am defining my input
image = np.array(image)
mask = np.array(mask)
mask = tf.keras.utils.to_categorical(mask, num_classes=10, dtype='float32') #Is this true?
return image, mask
def __len__(self):
return int(np.ceil(len(self.image_names)/float(self.batch_size)))
Is this true? If it is, then, to get the label/class as output what should I change in my input? Should I change the value of pixel of my mask according to my class?
Here is my U-Net architecture.
# Convolution and deconvolution Blocks
def down_scaling_block(x, filters, kernel_size=(3, 3), padding="same", strides=1):
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
pool = keras.layers.MaxPool2D((2, 2), (2, 2))(conv)
return conv, pool
def up_scaling_block(x, skip, filters, kernel_size=(3, 3), padding="same", strides=1):
conv_t = keras.layers.UpSampling2D((2, 2))(x)
concat = keras.layers.Concatenate()([conv_t, skip])
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(concat)
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
return conv
def bottleneck(x, filters, kernel_size=(3, 3), padding="same", strides=1):
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
return conv
def UNet():
filters = [16, 32, 64, 128, 256]
inputs = keras.layers.Input((image_size, image_size, 3))
'''inputs2 = keras.layers.Input((image_size, image_size, 1))
conv1_2, pool1_2 = down_scaling_block(inputs2, filters[0])'''
Input = inputs
conv1, pool1 = down_scaling_block(Input, filters[0])
conv2, pool2 = down_scaling_block(pool1, filters[1])
conv3, pool3 = down_scaling_block(pool2, filters[2])
'''conv3 = keras.layers.Conv2D(filters[2], kernel_size=(3,3), padding="same", strides=1, activation="relu")(pool2)
conv3 = keras.layers.Conv2D(filters[2], kernel_size=(3,3), padding="same", strides=1, activation="relu")(conv3)
drop3 = keras.layers.Dropout(0.5)(conv3)
pool3 = keras.layers.MaxPooling2D((2,2), (2,2))(drop3)'''
conv4, pool4 = down_scaling_block(pool3, filters[3])
bn = bottleneck(pool4, filters[4])
deConv1 = up_scaling_block(bn, conv4, filters[3]) #8 -> 16
deConv2 = up_scaling_block(deConv1, conv3, filters[2]) #16 -> 32
deConv3 = up_scaling_block(deConv2, conv2, filters[1]) #32 -> 64
deConv4 = up_scaling_block(deConv3, conv1, filters[0]) #64 -> 128
outputs = keras.layers.Conv2D(10, (1, 1), padding="same", activation="softmax")(deConv4)
model = keras.models.Model(inputs, outputs)
return model
model = UNet()
model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["acc"])
train_gen = DataGen(train_img, train_path, image_size=image_size, batch_size=batch_size)
valid_gen = DataGen(valid_img, train_path, image_size=image_size, batch_size=batch_size)
test_gen = DataGen(test_img, test_path, image_size=image_size, batch_size=batch_size)
train_steps = len(train_img)//batch_size
valid_steps = len(valid_img)//batch_size
model.fit_generator(train_gen, validation_data=valid_gen, steps_per_epoch=train_steps, validation_steps=valid_steps,
epochs=epochs)
I hope that I explained my question properly. Any help appriciated!
UPDATE: I changed the value of each pixel in mask as per object class. (If the image contains object which I want to classify as object no. 2, then I changed the value of mask pixel to 2. the whole array of mask will contain 0(bg) and 2(object). Accordingly for each object, the mask will contain 0 and 3, 0 and 10 etc.)
Here I first changed the mask to binary and then if the value of pixel is greater than 1, I changed it to 1 or 2 or 3. (according to object/class no.)
Then I converted them to one_hot with to_categorical as shown in my code. training runs but the network doesnt learn anything. Accuracy and loss keep swinging between two values. What is my mistake here? Am I making a mistake at generating mask (changing the value of pixels?) Or at the function to_categorical?
PROBLEM FOUND:
I was making an error while creating mask.. I was reading image with cv2, which reads image as heightxwidth.. I was creating mask with pixel values according to class, after considering my image dimention as widthxheight.. Which was causing problem and making network not to learn anything.. It is working now..
Each image has one object that I want to classify. But in total I have images of 10 different objects. I am confused, how can I prepare my mask input? Is it considered as multi-label segmentation or only for one class?
If your dataset has N different labels (i.e: 0 - background, 1 - dogs, 2 -cats...), you have a multi class problem, even if your images contain only kind of object.
Should I convert my input to one hot encoded? Should I use to_categorical?
Yes, you should one-hot encode your labels. Using to_categorical boils down to the source format of your labels. Say you have N classes and your labels are (height, width, 1), where each pixel has a value in range [0,N). In that case keras.utils.to_categorical(label, N) will provide a float (height,width,N) label, where each pixel is 0 or 1. And you don't have to divide by 255.
if your source format is different, you may have to use a custom function to get the same output format.
Check out this repo (not my work): keras-unet. The notebooks folder contain two examples to train a u-net on small datasets. They are not multiclass, but it is easy to go step by step to use your own dataset. Star by loading your labels as:
im = Image.open(mask).resize((512,512))
im = to_categorical(im,NCLASSES)
reshape and normalize like this:
x = np.asarray(imgs_np, dtype=np.float32)/255
y = np.asarray(masks_np, dtype=np.float32)
y = y.reshape(y.shape[0], y.shape[1], y.shape[2], NCLASSES)
x = x.reshape(x.shape[0], x.shape[1], x.shape[2], 3)
adapt your model to NCLASSES
model = custom_unet(
input_shape,
use_batch_norm=False,
num_classes=NCLASSES,
filters=64,
dropout=0.2,
output_activation='softmax')
select the correct loss:
from keras.losses import categorical_crossentropy
model.compile(
optimizer=SGD(lr=0.01, momentum=0.99),
loss='categorical_crossentropy',
metrics=[iou, iou_thresholded])
Hope it helps