Keras crop input image to origin that is predicted by model - python

I would like to build a model in Keras that predicts what regions of an image are important, using this model:
crop_points = keras.Sequential([
Conv2D(8, (3,3), input_shape=(28, 28, 1)),
MaxPooling2D(),
Conv2D(8, (3,3)),
MaxPooling2D(),
Conv2D(8, (3,3)),
Flatten(),
Dense(16),
RepeatVector(num_samples),
LSTM(32, return_sequences=True),
TimeDistributed(Dense(2))
])
The model predicts a tensor of the length num_samples with the origin of the cropping region. shape = (num_samples, 2) I would now like to have a Lambda layer (or a custom layer if that works better) to crop the input image to each of those predicted tensors for further processing with another model. This needs to be done in Keras as this model should be trained end-to-end and in the end, be exported to CoreML.
My current Lambda layer looks like this:
# Lambda layer
def crop_image(tensor):
image = tensor[0]
point = tensor[1]
x_location = int(K.cast(point[0],"int32"))
y_location = int(K.cast(point[1],"int32"))
print("x: {}, y: {}".format(x,y))
chunk = x_test[i][x_location:x_location + chunk_size, y_location:y_location + chunk_size]
flattened = K.concatenate(chunk).ravel()
flattened = K.append(chunk, x_location)
flattened = K.append(chunk, y_location)
#flattened = np.array(flattened)
#return tf.convert_to_tensor(flattened)
My training data looks like this:
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
x_train = x_train / 255
x_test = x_test / 255
However, Keras complains int() argument must be a string, a bytes-like object or a number, not 'Tensor' as it is unable to convert the tensors to integers, which I need for cropping the image. How shall I write my Lambda layer to crop the image dynamically based on the predicted origins?

Related

Tensorflow convolutional autoencoders doesn't converge with my data

My problem is to create an autoencoder model to recognize anomalies on a cardboard-like surface. I know that I can use an autoencoder and train it using "good" samples, and later I can use it to recognize "bad" samples (i.e., anomalies).
I've built convolutional autoencoders in Tensorflow based on Building autoencoder in Keras and PyImageSearch autoencoders. Both examples use MNIST dataset and they work perfectly (loss is decreasing, accuracy is growing up to about 0.85). However, I tried to train both autoencoders using my custom pictures, and the problem is that my models don't converge - the training loss (I tried binary_crossentropy and mse as stated in the websites) gets stuck at some level, e.g. 3.0873e-04 or 0.0016 (depending on loss and way of normalizing the data), and accuracy is 0 or sth like 1.2618e-07. Below is the sample network architecture.
input_layer = layers.Input(shape=(28, 28, 1))
data_augmentation = tf.keras.Sequential()
data_augmentation.add(tensorflow.keras.layers.experimental.preprocessing.RandomFlip('horizontal'))
data_augmentation.add(tensorflow.keras.layers.experimental.preprocessing.RandomFlip('vertical'))
x = data_augmentation(input_layer, 0)
x = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
# the block below was added once and removed later to reduce network depth
x = layers.Conv2D(8, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), padding='same', activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
# the block below was added once and removed later to reduce network depth - END
x = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), padding='same', activation='sigmoid')(x)
autoencoder = Model(input_layer, decoded)
My dataset consists of about 50k pictures of size 50x50 (and resized to 28x28) presenting tiny parts of cardboard-like sheet: Example 1, Example 2, and here you can see the source 900x900 picture I used to create 50x50 pictures. For the model, I convert them to grayscale.
I used two ways of data normalization: the first one (taken from the websites) was to split their values by 255, and the second one was to use min-max normalization. The pixel values are in range 59-168. Here is how I create the dataset:
imgs = [cv2.imread(fname) for fname in glob.glob('{}/*.jpg'.format(dir_with_pictures))]
imgs = [img for img in imgs if img is not None] # to remove pictures which OpenCV was not able to load for some reason
dataset = np.array([np.expand_dims(np.array(cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (28, 28)), axis=-1) for img in imgs])
dataset = (dataset - np.min(dataset)) / (np.max(dataset) - np.min(dataset)) # one way of normalization, OR
dataset = dataset.astype('float32') / 255.0 # second way of normalization
And here I compile my model, and later it's trained using my dataset - either the whole dataset, or the training set creating by splitting the dataset 80/20:
autoencoder.compile(optimizer=Adam(learning_rate=1e-3), loss='mse', metrics=['accuracy'])
Is it possible that items in my dataset are 'too similar' to each other and that's why I can't train a good model? Or may there be anything other wrong with the dataset? What can I try to get a converging model? Should I pay attention to accuracy metric?

keep getting ERROR : Input 0 of layer lstm_8 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 94, 94, 32)

I'm trying to do alphabet classification using LSTM model. Input is binary image of 96x96 size
first step is loading the data
DATADIR = '/content/drive/MyDrive/Mihir_etal_Assamese_Dataset/DATASET_1'
CATEGORIES = ['1','2','3','4','5']
kernel=np.ones((2,2), np.int8)
training_data = []
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
label = CATEGORIES.index(category)
for img in os.listdir(path):
img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_GRAYSCALE)
ret,img_array = cv2.threshold(img_array,0,255,cv2.THRESH_OTSU)
gray = 255*(img_array < 128).astype(np.uint8)
coords = cv2.findNonZero(gray) # Find all zero points (text)
x, y, w, h = cv2.boundingRect(coords)
img_array = gray[y:y+h, x:x+w] # Crop the image
img_array = 255*(img_array < 128).astype(np.uint8)
img_array = cv2.resize(img_array, (96,96),)
training_data.append([img_array, label])
Then convert my list of training data into X, y array for model input
X = []
y = []
for data, label in training_data:
X.append(data)
y.append(label)
X= np.array(X).reshape(-1,96,96,1)
y= np.array(y)
Then splitting dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
building my model
from keras import Sequential
LSTM = tf.keras.layers.LSTM
Dropout = tf.keras.layers.Dropout
Dense = tf.keras.layers.Dense
Conv2D = tf.keras.layers.Conv2D
model = Sequential()
model.add(Conv2D(32, (3,3), input_shape = X_train.shape[:1]))
model.add(LSTM(32))
model.add(Dropout(0.5))
model.add(LSTM(50, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(50, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(50))
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))
I've tried all kinds of solutions but I keep getting a mismatch of dimensions in my model input. Tried removing adding Conv2D layer but it makes no difference. Still same error
Alright I found the problem. While using multiple LSTM layers, you have to add return_sequence= True for all layers except the last one so every layer gets the proper input.

CNN for 5 Classifications: ValueError: Shapes (None, 228, 228, 1) and (None, 1) are incompatible

So I am building a CNN to identify 5 different classifications/diseases of images. I am pretty sure, I may have done something wrong with my one-hot encoding...or creating of my train_y label values, but I cannot figure out how to fix it. Any help is appreciated! I keep getting the error:
ValueError: Shapes (None, 228, 228, 1) and (None, 1) are incompatible
Here is my code:
DATADIR = "/Users/...pathname"
CATEGORIES = ["disease0", "disease1", "disease2", "disease3", "non_disease"]
training_data = []
IMG_SIZE = 228 #for resizing the image, but need to find the right size for this, (5472,3648) is original image size
def creating_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR, category) # gets us into the path for 5 diseases directory
class_num = CATEGORIES.index(category) #assign one hot encoding to each disease.. [1,0,0,0,0]
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE) # convert images to an array, IMREAD_COLOR for rgb
new_array = cv2.resize(img_array, (IMG_SIZE,IMG_SIZE), interpolation=cv2.INTER_AREA) #would be (5472,3648) at full size. INTER_AREA for shrinking an image
training_data.append([new_array, class_num]) #would be img_array if not resizing
except Exception as e:
pass
creating_training_data() #calling the function
random.shuffle(training_data) #shuffling the data
#for sample in training_data[:10]:
#print(sample[1])
train_X = [] #packing shuffled data into the variables we will use right before feeding it to neural network
train_y = [] # could also put validation set here
for features, label in training_data: #building out lists for features and labels
train_X.append(features)
train_y.append(label)
#convert train_X into a numpy array
train_X = np.array(train_X).reshape(-1, IMG_SIZE, IMG_SIZE, 1) #-1 is how many features do we have..catch all for anything, any number. 3 is for RGB. 1 for grey
#normalize the data by scaling...for pixel data min is 0 and max is 255
train_X = train_X/255.0 #may need to use keras.utils.normalize to perform this instead
print(train_X.shape)
print(train_y.shape)
#Creating our CNN
model = Sequential()
model.add(Conv2D(64, (3,3), input_shape=train_X.shape[1:])) #skip the -1...using the shape of the data (228,228,1)
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2))) # Now we have a 2x64 CNN
model.add(Flatten()) #Flatten the data because Convolutional is 2D whereas the dense layer wants a 1D data set
model.add(Dense(64))
#Adding Output Layer
model.add(Dense(1))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy",
optimizer="adam",
metrics=["sparse_categorical_accuracy"])
model.fit(train_X, train_y, batch_size=5, epochs=10,validation_split=0.2)
You're training your model to predict 5 classes though in this line:
model.add(Dense(1))
model.add(Activation('softmax'))
You specified the last softmax layer to predict only one class, you should make the last softmax layer match the number of classes you want to predict.
Edit regarding the labelling of data:
model.add(Dense(5))
model.add(Activation('softmax'))
Since you're training the model to predict 5 classes, and your data is labelled as 5 classes.
Regarding the labeling list try something like this:
DATADIR = "/Users/...pathname"
CATEGORIES = ["disease0", "disease1", "disease2", "disease3", "non_disease"]
train_x = []
train_y = []
IMG_SIZE = 228 #for resizing the image, but need to find the right size for this, (5472,3648) is original image size
def creating_training_data():
for category in CATEGORIES:
labels = [0, 0, 0, 0, 0]
path = os.path.join(DATADIR, category) # gets us into the path for 5 diseases directory
class_num = CATEGORIES.index(category) #assign one hot encoding to each disease.. [1,0,0,0,0]
labels[class_num] = 1
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE) # convert images to an array, IMREAD_COLOR for rgb
new_array = cv2.resize(img_array, (IMG_SIZE,IMG_SIZE), interpolation=cv2.INTER_AREA) #would be (5472,3648) at full size. INTER_AREA for shrinking an image
train_x.append(new_array) #would be img_array if not resizing
train_y.append(labels)
except Exception as e:
pass
train_X = np.array(train_X).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
train_X = train_X/255.0 #may need to use keras.utils.normalize to perform this instead
print(train_X.shape)
print(train_y.shape)
#Creating our CNN
model = Sequential()
model.add(Conv2D(64, (3,3), input_shape=train_X.shape[1:])) #skip the -1...using the shape of the data (228,228,1)
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2))) # Now we have a 2x64 CNN
model.add(Flatten()) #Flatten the data because Convolutional is 2D whereas the dense layer wants a 1D data set
model.add(Dense(64))
#Adding Output Layer
model.add(Dense(5))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
model.fit(train_X, train_y, batch_size=5, epochs=10,validation_split=0.2, shuffle=True)

Predict radius of multiple circle in images with CNN

I am trying to calculate the radius of circles in an image using a convolutional neural network.
I have only the image as input and the radius on the output side, so the mapping is [image]->[radius of circle].
Input dimensions and neural network architecture are as following:
from tensorflow.keras import layers
from tensorflow.keras import Model
img_input = layers.Input(shape=(imgsize, imgsize, 1))
x = layers.Conv2D(16, (3,3), activation='relu', strides =1, padding = 'same')(img_input)
x = layers.Conv2D(32, (3,3), activation='relu', strides = 2)(x)
x = layers.Conv2D(128, (3,3), activation='relu', strides = 2)(x)
x = layers.MaxPool2D(pool_size=2)(x)
x = layers.Conv2D(circle_per_box, 1, activation='linear', strides = 2)(x)
output = layers.Flatten()(x)
model_CNN = Model(img_input, output)
model_CNN.summary()
model_CNN.compile(loss='mean_squared_error',optimizer= 'adam', metrics=['mse'])
X_train, X_test, Y_train, Y_test = train_test_split(image, radii, test_size=0.2, random_state=0)
print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
(8000, 12, 12, 1) (2000, 12, 12, 1) (8000, 1) (2000, 1)
Y_train
array([[1.01003947],
[1.32057104],
[0.34507285],
...,
[1.53130402],
[0.69527609],
[1.85973669]])
If I calculate this for one circle per image, I get a solid result:
With more circles (see image) per image, however, the same network collapses and I get the following result:
Shape of Y.train for 2 circles reads:
Y_train.shape
(10000, 2)
Y.train
array([[1.81214007, 0.68388911],
[1.47920612, 1.04222943],
[1.90827465, 1.43238623],
...,
[1.40865229, 1.65726638],
[0.52878558, 1.94234548],
[1.57923437, 1.19544775]])
Why does the neural network behave this way?
If I try to calculate the radius of the two generated circles in the image separately as described above, I get good results again, but not if two circles are in the image at the same time.
Does anyone have any ideas/suggestions?

Tensorflow U-Net segmentation mask input

I am new to tensorflow and Semantic segmentation.
I am designing a U-Net for semantic segmentaion. Each image has one object that I want to classify. But in total I have images of 10 different objects. I am confused, how can I prepare my mask input? Is it considered as multi-label segmentation or only for one class?
Should I convert my input to one hot encoded? Should I use to_categorical? I find exaples for multi-class segmentation, but I don't know, If that's the case here. Because in one image I only have one object to detect/classify.
I tried using this as my code for input. But I am not sure, what I am doing is right or not.
#Generation of batches of image and mask
class DataGen(keras.utils.Sequence):
def __init__(self, image_names, path, batch_size, image_size=128):
self.image_names = image_names
self.path = path
self.batch_size = batch_size
self.image_size = image_size
def __load__(self, image_name):
# Path
image_path = os.path.join(self.path, "images/aug_test", image_name) + ".png"
mask_path = os.path.join(self.path, "masks/aug_test",image_name) + ".png"
# Reading Image
image = cv2.imread(image_path, 1)
image = cv2.resize(image, (self.image_size, self.image_size))
# Reading Mask
mask = cv2.imread(mask_path, -1)
mask = cv2.resize(mask, (self.image_size, self.image_size))
## Normalizaing
image = image/255.0
mask = mask/255.0
return image, mask
def __getitem__(self, index):
if(index+1)*self.batch_size > len(self.image_names):
self.batch_size = len(self.image_names) - index*self.batch_size
image_batch = self.image_names[index*self.batch_size : (index+1)*self.batch_size]
image = []
mask = []
for image_name in image_batch:
_img, _mask = self.__load__(image_name)
image.append(_img)
mask.append(_mask)
#This is where I am defining my input
image = np.array(image)
mask = np.array(mask)
mask = tf.keras.utils.to_categorical(mask, num_classes=10, dtype='float32') #Is this true?
return image, mask
def __len__(self):
return int(np.ceil(len(self.image_names)/float(self.batch_size)))
Is this true? If it is, then, to get the label/class as output what should I change in my input? Should I change the value of pixel of my mask according to my class?
Here is my U-Net architecture.
# Convolution and deconvolution Blocks
def down_scaling_block(x, filters, kernel_size=(3, 3), padding="same", strides=1):
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
pool = keras.layers.MaxPool2D((2, 2), (2, 2))(conv)
return conv, pool
def up_scaling_block(x, skip, filters, kernel_size=(3, 3), padding="same", strides=1):
conv_t = keras.layers.UpSampling2D((2, 2))(x)
concat = keras.layers.Concatenate()([conv_t, skip])
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(concat)
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
return conv
def bottleneck(x, filters, kernel_size=(3, 3), padding="same", strides=1):
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
return conv
def UNet():
filters = [16, 32, 64, 128, 256]
inputs = keras.layers.Input((image_size, image_size, 3))
'''inputs2 = keras.layers.Input((image_size, image_size, 1))
conv1_2, pool1_2 = down_scaling_block(inputs2, filters[0])'''
Input = inputs
conv1, pool1 = down_scaling_block(Input, filters[0])
conv2, pool2 = down_scaling_block(pool1, filters[1])
conv3, pool3 = down_scaling_block(pool2, filters[2])
'''conv3 = keras.layers.Conv2D(filters[2], kernel_size=(3,3), padding="same", strides=1, activation="relu")(pool2)
conv3 = keras.layers.Conv2D(filters[2], kernel_size=(3,3), padding="same", strides=1, activation="relu")(conv3)
drop3 = keras.layers.Dropout(0.5)(conv3)
pool3 = keras.layers.MaxPooling2D((2,2), (2,2))(drop3)'''
conv4, pool4 = down_scaling_block(pool3, filters[3])
bn = bottleneck(pool4, filters[4])
deConv1 = up_scaling_block(bn, conv4, filters[3]) #8 -> 16
deConv2 = up_scaling_block(deConv1, conv3, filters[2]) #16 -> 32
deConv3 = up_scaling_block(deConv2, conv2, filters[1]) #32 -> 64
deConv4 = up_scaling_block(deConv3, conv1, filters[0]) #64 -> 128
outputs = keras.layers.Conv2D(10, (1, 1), padding="same", activation="softmax")(deConv4)
model = keras.models.Model(inputs, outputs)
return model
model = UNet()
model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["acc"])
train_gen = DataGen(train_img, train_path, image_size=image_size, batch_size=batch_size)
valid_gen = DataGen(valid_img, train_path, image_size=image_size, batch_size=batch_size)
test_gen = DataGen(test_img, test_path, image_size=image_size, batch_size=batch_size)
train_steps = len(train_img)//batch_size
valid_steps = len(valid_img)//batch_size
model.fit_generator(train_gen, validation_data=valid_gen, steps_per_epoch=train_steps, validation_steps=valid_steps,
epochs=epochs)
I hope that I explained my question properly. Any help appriciated!
UPDATE: I changed the value of each pixel in mask as per object class. (If the image contains object which I want to classify as object no. 2, then I changed the value of mask pixel to 2. the whole array of mask will contain 0(bg) and 2(object). Accordingly for each object, the mask will contain 0 and 3, 0 and 10 etc.)
Here I first changed the mask to binary and then if the value of pixel is greater than 1, I changed it to 1 or 2 or 3. (according to object/class no.)
Then I converted them to one_hot with to_categorical as shown in my code. training runs but the network doesnt learn anything. Accuracy and loss keep swinging between two values. What is my mistake here? Am I making a mistake at generating mask (changing the value of pixels?) Or at the function to_categorical?
PROBLEM FOUND:
I was making an error while creating mask.. I was reading image with cv2, which reads image as heightxwidth.. I was creating mask with pixel values according to class, after considering my image dimention as widthxheight.. Which was causing problem and making network not to learn anything.. It is working now..
Each image has one object that I want to classify. But in total I have images of 10 different objects. I am confused, how can I prepare my mask input? Is it considered as multi-label segmentation or only for one class?
If your dataset has N different labels (i.e: 0 - background, 1 - dogs, 2 -cats...), you have a multi class problem, even if your images contain only kind of object.
Should I convert my input to one hot encoded? Should I use to_categorical?
Yes, you should one-hot encode your labels. Using to_categorical boils down to the source format of your labels. Say you have N classes and your labels are (height, width, 1), where each pixel has a value in range [0,N). In that case keras.utils.to_categorical(label, N) will provide a float (height,width,N) label, where each pixel is 0 or 1. And you don't have to divide by 255.
if your source format is different, you may have to use a custom function to get the same output format.
Check out this repo (not my work): keras-unet. The notebooks folder contain two examples to train a u-net on small datasets. They are not multiclass, but it is easy to go step by step to use your own dataset. Star by loading your labels as:
im = Image.open(mask).resize((512,512))
im = to_categorical(im,NCLASSES)
reshape and normalize like this:
x = np.asarray(imgs_np, dtype=np.float32)/255
y = np.asarray(masks_np, dtype=np.float32)
y = y.reshape(y.shape[0], y.shape[1], y.shape[2], NCLASSES)
x = x.reshape(x.shape[0], x.shape[1], x.shape[2], 3)
adapt your model to NCLASSES
model = custom_unet(
input_shape,
use_batch_norm=False,
num_classes=NCLASSES,
filters=64,
dropout=0.2,
output_activation='softmax')
select the correct loss:
from keras.losses import categorical_crossentropy
model.compile(
optimizer=SGD(lr=0.01, momentum=0.99),
loss='categorical_crossentropy',
metrics=[iou, iou_thresholded])
Hope it helps

Categories