I have 256×192 pixel images and search a small and fast cnn as a "pre scanner" to find interesting parts on a image (like 52x52, 32x32 chunks etc), which will be checked with a more complex cnn. The reason for this small cnn is the use within an embedded system with limited resources.
Unfortunately I'm new at this topic, tensorflow and keras. My first idea was to create an net with only one 2D conv which works like an 1d conv. In this case the kernel should have a height of 192 and a width of 1 (maybe later 3).
This is the model which I build on tensorflow 2:
# Model
model = models.Sequential()
model.add(layers.Conv2D(5, (1, 192), activation='relu', input_shape=(256, 192, 3)))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
The idea is to get one value per row which indicates if something "interesting" can be in this row. Based of this information and the neighbors a bigger part of the image will be cut out and feed into the more complex cnn.
I have prepared normal images with 256x192px and for each image a text file with 256 values (0.0 or 1.0). Each 0/1 represents one row and indicates if something interesting is in this row or not.
This was my naive plan but the training crashes immediately with an error that I dont understand:
ValueError: Dimensions must be equal, but are 32 and 256 for 'metrics/accuracy/Equal' (op: 'Equal') with input shapes: [32,256], [32,256,1].
I think I basic idea/strategy is wrong. I dont unterstand where the 32 comes from. Can someone explain my mistake? And is my idea even feasible?
Edit:
As requested the complete, dirty code. If there are some major flaws, please be forgiving. This is my first Python experiment.
import tensorflow as tf
import os
from tensorflow.keras import datasets, layers, models
from PIL import Image
from numpy import asarray
train_images = []
train_labels = []
test_images = []
test_labels = []
# Prepare
dir = os.listdir('images/gt_image')
split = len(dir)*0.2
c = 0
for file in dir:
c = c + 1
im = Image.open('images/gt_image/' + file)
data = im.load()
image = []
for x in range(0, im.size[0]):
row = []
for y in range(0, im.size[1]):
row.append([x/255 for x in data[x, y]])
image.append(row)
if c <= split:
test_images.append(image)
else:
train_images.append(image)
file = open('images/gt_labels/' + file + '.txt', 'r')
label = file.readlines()[0].split(', ')
if c <= split:
test_labels.append(label)
else:
train_labels.append(label)
print('prepare done')
# Model
model = models.Sequential()
model.add(layers.Conv2D(5, (1, 192), activation='relu', input_shape=(256, 192, 3)))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.summary()
print('compile done')
# Learning
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
Related
I am new to the tf.data API and trying to use it to load images from disk in the Dogs vs. Cats Redux: Kernels Edition Kaggle competition. To do this, I first created a pandas DataFrame named train_df with two columns - file_path containing the relative path of images and target containing the target labels 0 (for cat) and 1(for dog). Here's how the first 10 rows of the DataFrame looks like:
Then, I tried loading the images with the following code:
import tensorflow as tf
BATCH_SIZE = 128
IMG_HEIGHT = 224
IMG_WIDTH = 224
def read_images(X, y):
X = tf.io.read_file(X)
X = tf.io.decode_image(X, expand_animations=False, dtype=tf.float32, channels=3)
X = tf.image.resize(X, [IMG_HEIGHT, IMG_WIDTH])
X = tf.keras.applications.efficientnet.preprocess_input(X, data_format="channels_last")
return (X, y)
def build_data_pipeline(X, y):
data = tf.data.Dataset.from_tensor_slices((X, y))
data = data.map(read_images)
data = data.batch(BATCH_SIZE)
data = data.prefetch(tf.data.AUTOTUNE)
return data
tf_data = build_data_pipeline(train_df["file_path"], train_df["target"])
After this, I tried training my model using the following code
model.fit(tf_data, epochs=10)
but got a training accuracy of only 50% whereas with ImageDataGenerator, I am getting an accuracy of 99%. Thus, the problem lies somewhere in the data loading part which I am not able find out.
I have used EfficientNetB0 with weights trained from imagenet as feature extractor and single neuron layer at the end as classifier.
Pretrained EfficientNetB0 model:
pretrained_model = tf.keras.applications.EfficientNetB0(
input_shape=(IMG_HEIGHT, IMG_WIDTH, 3),
include_top=False,
weights="imagenet"
)
for layer in pretrained_model.layers:
layer.trainable = False
Dense layer with one neuron at the end of the EfficientNetB0:
pretrained_output = pretrained_model.get_layer('top_activation').output
x = tf.keras.layers.GlobalAveragePooling2D()(pretrained_output)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dense(1, activation="sigmoid")(x)
model = tf.keras.models.Model(pretrained_model.input, x)
Compiling the model:
model.compile(
optimizer="adam",
loss="binary_crossentropy",
metrics=["accuracy"]
)
In the above notebook, change the input reading function read_images as follows:
def read_images(X, y):
X = tf.io.read_file(X)
X = tf.image.decode_jpeg(X, channels = 3)
X = tf.image.resize(X, [IMG_HEIGHT, IMG_WIDTH]) #/255.0
return (X, y)
Also note that, tf.keras.applications.EfficientNet-Bx has in-built normalization layer. So, it's better not to normalize the data in the above function (i.e. /255.0).
I am trying to build a transfer learning model to classify images. The images are a gray-scale (2D). previously I used image_dataset_from_directory method to read the images and there was no problem. However, I am trying to use a custom read function to have more control and access on the data such as knowing how many images in each class. When using this custom read function, I get an error (down below) while trying to train the model. I am not sure about what caused this error.
part1: reading the dataset
import numpy as np
import os
import tensorflow as tf
import cv2
from tensorflow import keras
# neural network
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers.experimental import preprocessing
IMG_WIDTH=160
IMG_HEIGHT=160
DATA_PATH = r"C:\Users\user\Documents\chest_xray"
TRAIN_DIR = os.path.join(DATA_PATH, 'train')
def create_dataset(img_folder):
img_data_array=[]
class_name=[]
for dir1 in os.listdir(img_folder):
for file in os.listdir(os.path.join(img_folder, dir1)):
image_path= os.path.join(img_folder, dir1, file)
image= cv2.imread( image_path, 0)
image=cv2.resize(image, (IMG_HEIGHT, IMG_WIDTH),interpolation = cv2.INTER_AREA)
image=np.array(image)
image = image.astype('float32')
image /= 255
img_data_array.append(image)
class_name.append(dir1)
return img_data_array, class_name
# extract the image array and class name
img_data, class_name =create_dataset(TRAIN_DIR)
target_dict={k: v for v, k in enumerate(np.unique(class_name))}
target_dict
target_val= [target_dict[class_name[i]] for i in range(len(class_name))]
this part will produce A list that has a size of 5232. inside the list there are numpy arrays of size 160X160 (float 32)
part 2: creating the model
def build_model():
inputs = tf.keras.Input(shape=(160, 160, 3))
x = Sequential(
[
preprocessing.RandomRotation(factor=0.15),
preprocessing.RandomTranslation(height_factor=0.1, width_factor=0.1),
preprocessing.RandomFlip(),
preprocessing.RandomContrast(factor=0.1),
],
name="img_augmentation",
)(inputs)
# x = img_augmentation(inputs)
model=tf.keras.applications.EfficientNetB7(include_top=False,
drop_connect_rate=0.4,
weights='imagenet',
input_tensor=x)
# Freeze the pretrained weights
model.trainable = False
# Rebuild top
x = tf.keras.layers.GlobalAveragePooling2D(name="avg_pool")(model.output)
x = tf.keras.layers.BatchNormalization()(x)
top_dropout_rate = 0.2
x = tf.keras.layers.Dropout(top_dropout_rate, name="top_dropout")(x)
outputs = tf.keras.layers.Dense(1, name="pred")(x)
# Compile
model = tf.keras.Model(inputs, outputs, name="EfficientNet")
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-2)
model.compile(
optimizer=optimizer,
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=["accuracy"]
)
return model
model = build_model()
part 3: train the model
history = model.fit(x=np.array(img_data), y=np.array(target_val), epochs=5)
the error I get:
ValueError: Shape must be rank 4 but is rank 3 for '{{node
EfficientNet/img_augmentation/random_rotation_1/transform/ImageProjectiveTransformV3}} =
ImageProjectiveTransformV3[dtype=DT_FLOAT, fill_mode="REFLECT", interpolation="BILINEAR"]
(IteratorGetNext, EfficientNet/img_augmentation/random_rotation_1/rotation_matrix/concat,
EfficientNet/img_augmentation/random_rotation_1/transform/strided_slice,
EfficientNet/img_augmentation/random_rotation_1/transform/fill_value)' with input shapes:
[?,160,160], [?,8], [2], [].
The problem in the code is that OpenCV reads the image in grayscale format, but the grayscale format of the image returned is not (160,160,1) but (160,160).
Because of this fact, the error is thrown.
I managed to replicate your problem by testing it locally.
Say we randomly train on 12 samples.
Possible input formats:
#This one works
1. history = model.fit(x=np.random.rand(12,160,160,3), y=np.array([1,1,1,1,1,1,0,0,0,0,0,0]), epochs=5,verbose=1) WORKS
#This one works
2. history = model.fit(x=np.random.rand(12,160,160,1), y=np.array([1,1,1,1,1,1,0,0,0,0,0,0]), epochs=5,verbose=1) WORKS
#This one fails
3. history = model.fit(x=np.random.rand(12,160,160), y=np.array([1,1,1,1,1,1,0,0,0,0,0,0]), epochs=5,verbose=1) FAILS
(1) and (2) work.
(3) fails, yielding:
ValueError: Shape must be rank 4 but is rank 3 for '{{node
EfficientNet/img_augmentation/random_rotation_4/transform/ImageProjectiveTransformV2}} = ImageProjectiveTransformV2[dtype=DT_FLOAT, fill_mode="REFLECT", interpolation="BILINEAR"](IteratorGetNext,
EfficientNet/img_augmentation/random_rotation_4/rotation_matrix/concat,
EfficientNet/img_augmentation/random_rotation_4/transform/strided_slice)'
with input shapes: [?,160,160], [?,8], [2].
Therefore, ensure that your data format is in the shape (160,160,1) or (160,160,3).
As an alternative, after you you read the image with OpenCV, you can use
image = np.expand_dims(image,axis=-1)
to programatically insert the last axis (the grayscale).
I have been trying to train a network consisting of a VGG16network followed by some LSTM layers. Since my images are quite big and because VGG16 do not scale with image size i decided to break my images into small patches and train an LSTM to read an image piece by piece. Since I have a large data set i need the model to load the data batch wise. I have made an attempt to build a custom data generator, however i am not sure my implementation is correct since i don't know how to see which images the model has loaded while it is training. I also don't know how to make it adaptable if i have a different number of patches at each epoch.
My dataset is organised as followed:
data|
|---train|
|---class1|
|---image1|
|---im1_patch1.tif
|---im1_patch2.tif
...
|---im1_patch352.tif
|---image2|
|---im2_patch1.tif
|---im2_patch2.tif
...
|---im2_patch352.tif
|---class2|
|---image3|
|---im3_patch1.tif
|---im3_patch2.tif
...
|---im3_patch352.tif
|---image4|
|---im4_patch1.tif
|---im4_patch2.tif
...
|---im4_patch352.tif
As you can see my images are already broken into patches and i would like to load them batch wise such that my tensor X for each batch as the following dimension: [batch_size, n_patches, w, h, n_channels]. batch_size is the number of image at each epoch, n_patches is the number of patches per image, w and h are dimension of each patch (fixed) and n_channels number of channels of each patch (fixed to 3)
I first have some questions,
Does it make sense to have a batch_size > 1 for each epoch? I guess it doesn't make sense if my images have a different number of patches, am I right?
can I feed my LSTM with a different number of patches for each epoch? for example if my images have different number of patches. I am not sure how to do it.
Let's say i load one image (consisting of npatches) per epoch, what happen when the number of epoch is bigger than the number of images? Is the loading restarting from 0 (it doesn't seem in my code)? may be I should pick images randomly what do you think?
My network architecture is the following:
vgg = VGG16(
include_top=False,
weights='imagenet',
input_shape=(224, 224, 3)
)
for layer in vgg.layers:
layer.trainable = False
model = Sequential()
model.add(TimeDistributed(vgg, input_shape=(npatches, w, h, nchannels)))
model.add(TimeDistributed(Flatten(name="flatten")))
model.add(TimeDistributed(Dense(4096, activation="relu")))
model.add(TimeDistributed(Dropout(0.5)))
model.add(TimeDistributed(Dense(4096, activation="relu")))
model.add(TimeDistributed(Dropout(0.5)))
model.add(TimeDistributed(Flatten(name="flatten")))
model.add(LSTM(264, activation='tanh',return_sequences=True))
model.add(LSTM(128, activation='tanh',return_sequences=True))
model.add(LSTM(64, activation='tanh',return_sequences=False))
model.add(Dense(64, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(1, activation='sigmoid'))
model.summary()
Finally, my attempt to build a custom image generator:
def LoadImages(folder, batch_size, npatches):
batch_start = 0
batch_end = n_patches-1
epoch_id = 0
files = list(paths.list_images(folder))
n_images = len(files)
StopCriteria = n_images/(batch_size*n_patches)
while True:
while epoch_id < StopCriteria:
patch_id = 0
X = np.empty([batch_size, n_patches, 224, 224, 3])
Y = np.empty([batch_size, n_patches])
for image_path in files[batch_start:batch_end]:
img = tf.keras.preprocessing.image.load_img(image_path,color_mode="rgb")
input_arr = keras.preprocessing.image.img_to_array(img)
X[epoch_id,patch_id,:,:,:] = input_arr
if image_path.split("\\")[-3].split('/')[-1] == 'long':
Y[epoch_id,patch_id] = 0
if image_path.split("\\")[-3].split('/')[-1] == 'short':
Y[epoch_id,patch_id] = 1
patch_id += 1
yield (X,Y)
batch_start += batch_size + n_patches-1
batch_end += batch_size + n_patches-1
epoch_id += 1
Thank you!
I have a image data set that I want to use to train a CNN. I have initialized a class "tensorflow.python.data.ops.dataset_ops.DatasetV1Adapter" object that I understand is essentially an iterator that caches the train images in batches so that the entire data set need not be loaded at once.
I have received this error when trying to call model.fit():
ValueError: Error when checking input: expected conv2d_input to have 4 dimensions, but got array with
shape (None, 1)
I understand that I need to add a dimension to my model input. I want to add a channels dimension to my images. I have tried to use np.expand_dims() and tf.expand_dims() on my class "tensorflow.python.data.ops.dataset_ops.DatasetV1Adapter" object but the former changes the object type and the latter is not supported for the class object. Any help is appreciated. Below is my model structure:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.summary()
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
history = model.fit(train_data, epochs=10, validation_data=(val_data),steps_per_epoch=x,
validation_steps=y)
I have been following the tutorial in the example listed here, https://www.tensorflow.org/tutorials/load_data/images, but have tried to create and load in my own data set.
Below is my tf pipeline:
BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)
data_dir = 'C:\\Users\\rtlum\\Documents\\DataSci_Projects\\PythonTensorFlowProjects\\google-images-download\\images'
list_ds = tf.data.Dataset.list_files(str(data_dir+"*.jpg")) #Make dataset of file paths
class_names = ['sad', 'angry']
size = 0
for count in enumerate(list_ds):
size += 1
val_data_size = size * .2
for f in list_ds.take(5):#test for correct file paths
print(f.numpy())
def get_label(file_path):
# convert the path to a list of path components
parts = tf.strings.split(file_path, os.path.sep)
# The second to last is the class-directory
return parts[-2] == class_names
def decode_img(img):
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_jpeg(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [64, 64])
def process_path(file_path):
label = get_label(file_path)
# load the raw data from the file as a string
img = tf.io.read_file(file_path)
img = decode_img(img)
return img, label
# Set `num_parallel_calls` so multiple images are loaded/processed in parallel.
labeled_ds = list_ds.map(process_path)
for image, label in labeled_ds.take(1):
print("Image shape: ", image.numpy().shape)
print("Label: ", label.numpy())
shuffle_buffer_size=1000
def prepare_for_training(ds, cache=True, shuffle_buffer_size=1000):
# This is a small dataset, only load it once, and keep it in memory.
# use `.cache(filename)` to cache preprocessing work for datasets that don't
# fit in memory.
if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
# Repeat forever
ds = ds.repeat()
ds = ds.batch(BATCH_SIZE)
# `prefetch` lets the dataset fetch batches in the background while the model
# is training.
ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
return ds
ds = prepare_for_training(list_ds)
val_data = ds.take(int(val_data_size))
train_data = ds.skip(int(val_data_size))
Two problems,
Your wildcard for directory matching appears to be incorrect.
By looking at your code, it seems that your data needs to follow a structure like,
data
|- sad
|- 1.jpg
...
|- angry
|- 1.jpg
...
This is not what you're matching when you say,
tf.data.Dataset.list_files(str(data_dir+"*.jpg"))
it should be,
tf.data.Dataset.list_files(str(data_dir+os.path.sep+"*"+os.path.sep+"*.jpg"))
You have the wrong dataset,
You have,
ds = prepare_for_training(list_ds)
should be,
ds = prepare_for_training(labeled_ds)
Other issues,
You are resizing data to 64x64, but your model accepts a 32x32 model.
You have 2 labels, but your model expects 10 classes.
You don't have a model compilation line (i.e. model.compile(...))
I am trying to figure out how to train a CNN in Keras without using ImageDataGenerator. Essentially I'm trying to figure out the magic behind the ImageDataGenerator class so that I don't have to rely on it for all my projects.
I have a dataset organized into 2 folders: training_set and test_set. Each of these folders contains 2 sub-folders: cats and dogs.
I am loading them all into memory using Keras' load_img class in a for loop as follows:
trainingImages = []
trainingLabels = []
validationImages = []
validationLabels = []
imgHeight = 32
imgWidth = 32
inputShape = (imgHeight, imgWidth, 3)
print('Loading images into RAM...')
for path in imgPaths:
classLabel = path.split(os.path.sep)[-2]
classes.add(classLabel)
img = img_to_array(load_img(path, target_size=(imgHeight, imgWidth)))
if path.split(os.path.sep)[-3] == 'training_set':
trainingImages.append(img)
trainingLabels.append(classLabel)
else:
validationImages.append(img)
validationLabels.append(classLabel)
trainingImages = np.array(trainingImages)
trainingLabels = np.array(trainingLabels)
validationImages = np.array(validationImages)
validationLabels = np.array(validationLabels)
When I print the shape() of the trainingImages and trainingLabels I get:
Shape of trainingImages: (8000, 32, 32, 3)
Shape of trainingLabels: (8000,)
My model looks like this:
model = Sequential()
model.add(Conv2D(
32, (3, 3), padding="same", input_shape=inputShape))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(len(classes)))
model.add(Activation("softmax"))
And when I compile and try to fit the data, I get:
ValueError: Error when checking target: expected activation_2 to have shape (2,) but got array with shape (1,)
Which tells me my data is not input into the system correctly. How can I properly prepare my data arrays without using ImageDataGenerator?
The error is because of your model definition instead of ImageDataGenerator (which I don't see used in the code you have posted). I am assuming that len(classes) = 2 because of the error message that you are getting. You are getting the error because the last layer of your model expects trainingLabels to have a vector of size 2 for each datapoint but your trainingLabels is a 1-D array.
For fixing this, you can either change your last layer to have just 1 unit because it's binary classification:
model.add(Dense(1))
or you can change your training and validation labels to vectors using one hot encoding:
from keras.utils import to_categorical
training_labels_one_hot = to_categorical(trainingLabels)
validation_labels_one_hot = to_categorical(validationLabels)