I am trying to calculate the radius of circles in an image using a convolutional neural network.
I have only the image as input and the radius on the output side, so the mapping is [image]->[radius of circle].
Input dimensions and neural network architecture are as following:
from tensorflow.keras import layers
from tensorflow.keras import Model
img_input = layers.Input(shape=(imgsize, imgsize, 1))
x = layers.Conv2D(16, (3,3), activation='relu', strides =1, padding = 'same')(img_input)
x = layers.Conv2D(32, (3,3), activation='relu', strides = 2)(x)
x = layers.Conv2D(128, (3,3), activation='relu', strides = 2)(x)
x = layers.MaxPool2D(pool_size=2)(x)
x = layers.Conv2D(circle_per_box, 1, activation='linear', strides = 2)(x)
output = layers.Flatten()(x)
model_CNN = Model(img_input, output)
model_CNN.summary()
model_CNN.compile(loss='mean_squared_error',optimizer= 'adam', metrics=['mse'])
X_train, X_test, Y_train, Y_test = train_test_split(image, radii, test_size=0.2, random_state=0)
print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
(8000, 12, 12, 1) (2000, 12, 12, 1) (8000, 1) (2000, 1)
Y_train
array([[1.01003947],
[1.32057104],
[0.34507285],
...,
[1.53130402],
[0.69527609],
[1.85973669]])
If I calculate this for one circle per image, I get a solid result:
With more circles (see image) per image, however, the same network collapses and I get the following result:
Shape of Y.train for 2 circles reads:
Y_train.shape
(10000, 2)
Y.train
array([[1.81214007, 0.68388911],
[1.47920612, 1.04222943],
[1.90827465, 1.43238623],
...,
[1.40865229, 1.65726638],
[0.52878558, 1.94234548],
[1.57923437, 1.19544775]])
Why does the neural network behave this way?
If I try to calculate the radius of the two generated circles in the image separately as described above, I get good results again, but not if two circles are in the image at the same time.
Does anyone have any ideas/suggestions?
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 11 months ago.
Improve this question
I've created a model to predict emotion by speaking! When i am trying to get features of voice i got the error
cannot reshape array of size 486 into shape (1,1)
I tried different reshape but nothing work! If i change the reshape in (1, -1) i got another error
ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 162, 1), found shape=(None, 486)
This is my model:
# scaling our data with sklearn's Standard scaler
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
x_train.shape, y_train.shape, x_test.shape, y_test.shape
# making our data compatible to model.
x_train = np.expand_dims(x_train, axis=2)
x_test = np.expand_dims(x_test, axis=2)
x_train.shape, y_train.shape, x_test.shape, y_test.shape
model=Sequential()
model.add(Conv1D(256, kernel_size=5, strides=1, padding='same', activation='relu', input_shape=(x_train.shape[1], 1)))
model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
model.add(Conv1D(256, kernel_size=5, strides=1, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
model.add(Conv1D(128, kernel_size=5, strides=1, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
model.add(Dropout(0.2))
model.add(Conv1D(64, kernel_size=5, strides=1, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
model.add(Flatten())
model.add(Dense(units=32, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(units=8, activation='softmax'))
model.compile(optimizer = 'adam' , loss = 'categorical_crossentropy' , metrics = ['accuracy'])
model.summary()
rlrp = ReduceLROnPlateau(monitor='loss', factor=0.4, verbose=0, patience=100, min_lr=0.0000001)
history=model.fit(x_train, y_train, batch_size=23, epochs=50, validation_data=(x_test, y_test), callbacks=[rlrp])
And this is the extract features function:
def extract_features(data, **kwargs):
# ZCR
result = np.array([])
zcr = np.mean(librosa.feature.zero_crossing_rate(y=data).T, axis=0)
result=np.hstack((result, zcr)) # stacking horizontally
# Chroma_stft
stft = np.abs(librosa.stft(data))
chroma_stft = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T, axis=0)
result = np.hstack((result, chroma_stft)) # stacking horizontally
# MFCC
mfcc = np.mean(librosa.feature.mfcc(y=data, sr=sample_rate).T, axis=0)
result = np.hstack((result, mfcc)) # stacking horizontally
# Root Mean Square Value
rms = np.mean(librosa.feature.rms(y=data).T, axis=0)
result = np.hstack((result, rms)) # stacking horizontally
# MelSpectogram
mel = np.mean(librosa.feature.melspectrogram(y=data, sr=sample_rate).T, axis=0)
result = np.hstack((result, mel)) # stacking horizontally
return result
def get_features(path):
# duration and offset are used to take care of the no audio in start and the ending of each audio files as seen above.
data, sample_rate = librosa.load(path, duration=2.5, offset=0.6)
# without augmentation
res1 = extract_features(data)
result = np.array(res1)
# data with noise
noise_data = noise(data)
res2 = extract_features(noise_data)
result = np.vstack((result, res2)) # stacking vertically
# data with stretching and pitching
new_data = stretch(data)
data_stretch_pitch = pitch(new_data, sample_rate)
res3 = extract_features(data_stretch_pitch)
result = np.vstack((result, res3)) # stacking vertically
return result
And here is the main where i am got the error:
if __name__ == "__main__":
# load the saved model (after training)
print("Please talk")
filename = "test.wav"
# record the file (start talking)
record_to_file(filename)
# extract features and reshape it
features =get_features(filename).reshape(1, -1)
# predict
result = model.predict(features)[0]
# show the result !
print("result:", result)
any thoughts about this error?
IIUC, Your error came from shape of features, maybe this helps you.
For example you have features like below:
features = np.random.rand(1, 486)
# features.shape
# (1, 486)
Then you need split this features to three part:
features = np.array_split(features, 3, axis=1)
features_0 = features[0] # shape : (1, 162)
features_1 = features[1] # shape : (1, 162)
features_2 = features[2] # shape : (1, 162)
then expand_dims and predict like below:
features_0 = np.expand_dims(features_0, axis=2)
result = model.predict(features_0)[0]
I would like to build a model in Keras that predicts what regions of an image are important, using this model:
crop_points = keras.Sequential([
Conv2D(8, (3,3), input_shape=(28, 28, 1)),
MaxPooling2D(),
Conv2D(8, (3,3)),
MaxPooling2D(),
Conv2D(8, (3,3)),
Flatten(),
Dense(16),
RepeatVector(num_samples),
LSTM(32, return_sequences=True),
TimeDistributed(Dense(2))
])
The model predicts a tensor of the length num_samples with the origin of the cropping region. shape = (num_samples, 2) I would now like to have a Lambda layer (or a custom layer if that works better) to crop the input image to each of those predicted tensors for further processing with another model. This needs to be done in Keras as this model should be trained end-to-end and in the end, be exported to CoreML.
My current Lambda layer looks like this:
# Lambda layer
def crop_image(tensor):
image = tensor[0]
point = tensor[1]
x_location = int(K.cast(point[0],"int32"))
y_location = int(K.cast(point[1],"int32"))
print("x: {}, y: {}".format(x,y))
chunk = x_test[i][x_location:x_location + chunk_size, y_location:y_location + chunk_size]
flattened = K.concatenate(chunk).ravel()
flattened = K.append(chunk, x_location)
flattened = K.append(chunk, y_location)
#flattened = np.array(flattened)
#return tf.convert_to_tensor(flattened)
My training data looks like this:
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
x_train = x_train / 255
x_test = x_test / 255
However, Keras complains int() argument must be a string, a bytes-like object or a number, not 'Tensor' as it is unable to convert the tensors to integers, which I need for cropping the image. How shall I write my Lambda layer to crop the image dynamically based on the predicted origins?
I am trying to train a 3D CNN on images ( in the form of NumPy arrays ) of dimension (19,163,279). My X_train has a shape of (740,19,163,279) and y_train has a shape of (185,19,163,279).
if K.image_data_format() == 'channels_first':
INPUT_SHAPE = (1, 19, 163, 279)
else:
INPUT_SHAPE = (19, 163, 279, 1)
And this is my model
def get_model(width=163, height=279, depth=19):
"""Build a 3D convolutional neural network model."""
inputs = keras.Input(INPUT_SHAPE)
x = layers.Conv3D(filters=64, kernel_size=3, activation="relu",padding='same')(inputs)
x = layers.MaxPool3D(pool_size=2,padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv3D(filters=64, kernel_size=3, activation="relu",padding='same')(x)
x = layers.MaxPool3D(pool_size=2,padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv3D(filters=128, kernel_size=3, activation="relu", padding='same')(x)
x = layers.MaxPool3D(pool_size=2,padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv3D(filters=64, kernel_size=3, activation="relu", padding='same')(x)
x = layers.MaxPool3D(pool_size=2,padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.GlobalAveragePooling3D()(x)
x = layers.Dense(units=16, activation="relu")(x)
x = layers.Dropout(0.3)(x)
outputs = layers.Dense(units=1, activation="sigmoid")(x)
# Define the model.
model = keras.Model(inputs, outputs, name="3dcnn")
return model
# Build model.
model = get_model(width=163, height=279, depth=19)
model.summary()
However, while training using the following code,
# Train the model, doing validation at the end of each epoch
epochs = 100
model.fit(
X_train,
validation_data=y_test,
epochs=epochs,
shuffle=True,
verbose=2,
callbacks=[checkpoint_cb, early_stopping_cb],
)
I am getting the following error
Error when checking input: expected input_1 to have 5 dimensions, but got array with shape (740, 19, 163, 279)
How do I solve this?
I don't think you're clear with what the code's doing.
For plain images you should use a 2D CNN. 3D CNN's are used for CT scans and MRI images.
Your input shape has to be the shape of the image, which does not include the number of examples. Your INPUT_SHAPE should be (19, 263, 179)
Your kernel size should have a number of filters = dimension of the conv layer. For a Conv2D layer your kernel_size would be for example equal to (3, 3)
In model.fit() you have to pass X_train, then y_train, then the rest of the parameters. How else will the model which labels to learn from?
In model.fit() the validation_data argument is a list of (X_test, y_test)
Hope this clarifies everything
I am training a CNN for image classification. Specifically, I am trying to create a lip reader that is able to classify an image of a segmented mouth with its associated phoneme. The images have a dimension of 64x64 and are flattened into a 1D array of length 4096. I have inserted the code for my current model below with its performance graphs and metrics. Does anyone have any advice for how I can continue to modify this model in order to raise the accuracy?
df = pd.read_csv("/kaggle/input/labeled-frames-resized/labeled_frames.csv", error_bad_lines=False)
labelencoder = LabelEncoder()
df['Phoneme'] = labelencoder.fit_transform(df['Phoneme'])
labels = np.asarray(df[['Phoneme']].copy())
df = df.drop(df.columns[0], axis = 1)
X_train, X_test, y_train, y_test = train_test_split(df, labels, random_state = 42, test_size = 0.2, stratify = labels)
X_train = tf.reshape(X_train, (8113, 4096, 1))
X_test = tf.reshape(X_test, (2029, 4096, 1))
model = Sequential()
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid', input_shape= (4096, 1)))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(39))
model.add(Activation('softmax'))
optimizer = keras.optimizers.Adam(lr=0.4)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train,y_train, epochs = 500, batch_size = 2048, validation_data = (X_test, y_test), shuffle = True)
You can easily convert it into 2D Convolution:
model.add(Conv2D(filters= 128, kernel_size=(3,3), activation ='relu',strides = (2,2),
padding = 'valid', input_shape= (64,64,1)))
model.add(MaxPooling2D(pool_size=(2,2))
...
model.add(Flatten())
model.add(Dense(39))
model.add(Activation('softmax'))
I've only worked with Conv1d so far because it seemed easier.
Can 1D Convolution be used on images?
Yes you can, but not recommended, unless you have a very specific case and know what you are doing. Assume your images as 1024x1024, what happens when you flatten them? The information that you extract with 2D Convolutions is more than 1D Convolutions.
Explanation:
You can use 1D convolution on images indeed, but not in every situation. (I might be wrong) When you flatten them, then every pixel will be a feature. If we wanted every pixel to be a feature, then we could use normal Dense layers after flattening also. But there would be a lot parameters to train. What I mean by this (total parameters size not included):
model= tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(...)
...
])
When you flatten them you might break the spatial coherence of the images. Using 2D convolutions might gain you accuracy. What we do with 2D convolutions is we visit the image and see what we can extract as an important feature, with max or average pooling.
You will not be able catch that much information with 1D convolutions.
We can feed the pooled feature maps into Fully Connected Layers before making predictions.
I have some faces cropped out of images, and I want to run them through a denoising autoencoder, the code which I got from here. When I run the code on the MNIST dataset, the results look fine, like the ones in the website. However, when I run it on my own images, I get a mostly or completely black image in return instead of simply the same image without the noise.
This is the original image for reference before I resized it, so you can tell how it looks.
This is the image after resizing which I had to do in order to feed it to the autoencoder. I sized it down to be 28x28.
These are the results plotted. For the first results, I actually expect my original grayscale image to appear before I've fed it into the autoencoder. For the second row, I had wanted it to be the same image but without the noise. As you can see, I get these odd outputs and I can't tell why.
Here is the code I've tried on the MNIST dataset. For my dataset, I skipped the preprocessing of the MNIST dataset and instead preprocessed my own images (Sized them down, made them grayscale...Their dimensions are (28, 28, 1), just like the original code intended. I tried changing the number of Epochs (I went through 10, 50, and 100), but there was no noticeable difference. I considered changing the layers, but after looking at some papers and other code, the layers seem to be the same as the ones presented. I tried looking up tutorials where the autoencoder works on regular images like mine and not just the MNIST dataset, but I couldn't really find any. I'm also confused as to why, when I plot the original array, I get black squares, even though when I use cv2_imshow to relay it I get the image I showed after resizing. I don't really know if it's the same issue. I've also tried training the autoencoder on my own dataset (Which has 785 images similar to the ones I've shown above), but to no avail. I've displayed the code I used down, and if there is something missing needed to understand my question please tell me.
import keras
from keras.datasets import mnist
import numpy as np
import matplotlib.pyplot as plt
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K
from keras.callbacks import TensorBoard
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) # adapt this if using `channels_first` image data format
noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
n = 10
plt.figure(figsize=(20, 2))
for i in range(1, n):
ax = plt.subplot(1, n, i)
plt.imshow(x_test_noisy[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
input_img = Input(shape=(28, 28, 1)) # adapt this if using `channels_first` image data format
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (7, 7, 32)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.fit(grayscale, grayscale,
epochs=100,
batch_size=128,
shuffle=True,
validation_data=(grayscale, grayscale),
callbacks=[TensorBoard(log_dir='/tmp/tb', histogram_freq=0, write_graph=True)])
Here is the code I used to feed my image into the autoencoder and display the results.
arr= cv2.imread('/content/FramesResized/frame0000sec.jpg')
#Converting the image to grayscale
gray = cv2.cvtColor(arr, cv2.COLOR_BGR2GRAY)
#Adding an axis as when the image was converted to grayscale, it become (28,28) and I need it to be (28,28,1)
if gray.ndim == 2:
gray = np.expand_dims(gray, axis=2)
#Making a new array to take my images, currently of which there is only one
grayscale = np.zeros([785, 28, 28, 1], dtype=np.uint8)
grayscale[0] = gray
#Feeding my image into the autoencoder
decoded_imgs = autoencoder.predict(grayscale)
#Plotting the before and after images
plt.figure(figsize=(20, 4))
for i in range(1, n):
# display original
ax = plt.subplot(2, n, i)
plt.imshow(grayscale[i].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
If anyone is wondering, I believe the issue was that I was applying an autoencoder that was trained on MNIST images to complex, RGB images, so the autoencoder reconstruction was very poor.
When I used the CIFAR-100 dataset to train the autoencoder, the results were much more coherent.