How to detect multiple objects using my classification network?

How to detect multiple objects using my classification network? - python

I have created a simple convolution network using keras that comes packed with tensorflow. I have trained the model and the accuracy looks good.
I have trained the network on 10 different classes. The network is able to differentiate between each of the 10 classes with an accuracy of 0.93.
Now, it is very much possible that there are multiple classes in the same image. Is there a way I could use my trained network to detect multiple objects in the same image? The best thing would be to get the coordinates/bounding-box around the objects detected, so that it is easier to test/visualize.
Here is how I wrote the network:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(128, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(256, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(256))
model.add(tf.keras.layers.Activation('elu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(10))
model.add(tf.keras.layers.Activation('softmax'))
model.compile(
optimizer=tf.train.AdamOptimizer(learning_rate=1e-3, ),
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['sparse_categorical_accuracy']
)
def train_gen(batch_size):
while True:
offset = np.random.randint(0, x_train.shape[0] - batch_size)
yield x_train[offset:offset+batch_size], y_train[offset:offset + batch_size]
model.fit_generator(
train_gen(512),
epochs=15,
steps_per_epoch=100,
validation_data=(x_valid, y_valid)
)
This works fine. How could I use this network to detect multiple objects from the 10 classes? Would I have re-train the network in someway?

In order to teach your model to detect more than one class per image, you will need to perform a few changes to your model and data, and re-train it.
Your final activation will now need to be a sigmoid, since you will not predict a single class probability distribution anymore. Now you want each output neuron to predict a value between 0 and 1, with more than one neuron possibly having values close to 1.
Your loss function should now be binary_crossentropy, since you will treat each output neuron as an independent prediction, which you will compare to the true label.
As I see you have been using sparse_categorical_crossentropy, I assume your labels were integers. You will want to change your label encoding to one-hot style now, each label having a len equal to num_classes, and having 1's only at those positions where the image has that class, the rest being 0's.
With these changes, you can now re-train your model to learn to predict more than one class per image.
As for predicting bounding boxes around the objects, that is a very different and much more challenging task. Advanced models such as YOLO or CRNN can do this, but their structure is much more complex.

Related

How to output a 3D tensor from a neural network?

My main input feature is 60x256x256 numpy array that is meant to generate a 60x256x256 binary mask (also in the form of a numpy array). The binary mask functions as a label, but I do not know how to generate a 3D numpy array or tensor output from my neural network. This is my current code:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, kernel_size=(5, 5), strides=(1, 1),
activation='relu',
input_shape=(60, 256, 256)))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(tf.keras.layers.Conv2D(64, (5, 5), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1000, activation='relu'))
model.add(tf.keras.layers.Dense(256, activation='softmax'))
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.CosineSimilarity(),
metrics=[tf.keras.metrics.CosineSimilarity()],
)
model.fit(
train,
epochs=6,
validation_data=ds_valid,
)
In short, I want the output of the last layer to match the input layer so that it can work with the CosineSimilarity loss function. Any suggestions other than this CNN-based approach will also be very helpful, as it seems CNNs are mostly used for classification.

At the most basic level you can use tf.keras.layers.Reshape. See https://www.tensorflow.org/tutorials/generative/autoencoder
So your last two layers could be:
model.add(tf.keras.layers.Dense(60*256*256))
model.add(tf.keras.layers.Reshape(60, 256, 256))
However I think what you're looking for is an autoencoder type network and to usetf.keras.layers.Conv2DTranspose layers.
The above link is an intro to Autoencoders and should be a good starting point I think.
Not sure about your use case but I think it's very likely you do want to use a convolution based approach because when you flatten the convolution you are forcing your network to forget all the information about the symmetry of the problem (i.e that it is a picture in 2D space). I don't think the fact that your problem is a regression problem affects this.

My CNN image recognition model produces fluctuating validation loss

My model is experiencing wild and big fluctuations in the validation loss and does not converge.
I am doing an image recognition project with my three dogs i.e. classifying the dog in the image. Two dogs are very similar and the 3rd is very different. I took 10 minute videos of each dog, separately. Frames were extracted as images at each second. My dataset consists of about 1800 photos, 600 of each dog.
This block of code is responsible for augmenting and creating the data to feed the model.
randomize = np.arange(len(imArr)) # imArr is the numpy array of all the images
np.random.shuffle(randomize) # Shuffle the images and labels
imArr = imArr[randomize]
imLab= imLab[randomize] # imLab is the array of labels of the images
lab = to_categorical(imLab, 3)
gen = ImageDataGenerator(zoom_range = 0.2,horizontal_flip = True , vertical_flip = True,validation_split = 0.25)
train_gen = gen.flow(imArr,lab,batch_size = 64, subset = 'training')
test_gen = gen.flow(imArr,lab,batch_size =64,subset = 'validation')
This picture is the result of the model below.
model = Sequential()
model.add(Conv2D(16, (11, 11),strides = 1, input_shape=(imgSize,imgSize,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Conv2D(32, (5, 5),strides = 1))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(BatchNormalization(axis=-1))
model.add(Dropout(0.3))
#Fully connected layer
model.add(Dense(256))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(3))
model.add(Activation('softmax'))
sgd = SGD(lr=0.004)
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])
batch_size = 64
epochs = 100
model.fit_generator(train_gen, steps_per_epoch=(len(train_gen)), epochs=epochs, validation_data=test_gen, validation_steps=len(test_gen),shuffle = True)
Things I have tried.
High/low Learning rate ( 0.01 -> 0.0001)
Increase Dropout to 0.5 in both Dense layers
Increase/Decrease size of both Dense Layers ( 128 min -> 4048 max)
Increased number of CNN layers
Introduced Momentum
Increased/Decreased Batch Size
Things I have not tried
I have not used any other loss or metric
I have not used any other optimiser.
Have not adjusted any parameters of the CNN layers
It seems that there is some form of randomness or too many parameters in my model. I am aware that it is currently overfitting, but that should not be the cause of the volatility(?).
I am not too worried about the performance of the model. I would like to achieve about a 70% accuracy. All I want to do now is to stabilise the validation accuracy and to converge.
Note:
At some epochs, the training loss is very low ( <0.1 ) but validation
loss is very high ( > 3 ).
The videos are taken on different backgrounds, but +- the same amount on each background for each dog.
Some images are a bit blurry.

Change the optimizer to Adam, definitely better. In your code you are using it but with default parameters, you are creating an SGD optimizer but in the compile line you introduce an Adam with no parameters. Play with the actual parameters of your optimizer.
I encourage you to take out the dropout first, see what is happening and the if you manage to overfit, start with low dropout and go up.
Also it might be due to some of your test samples are very hard to detect and thus increase the loss, maybe take out the shuffle in the validation set and watch for any peridiocities to try to find out if there are validation samples hard to detect.
Hope it helps!

I see you have tried a lot of different things. Few suggestions:
I see you use large filters in your Conv2D eg. 11x11 and 5x5. If your image dimensions are not very big, you should definitely go for lower filter dimensions like 3x3.
Try different optimizers, try Adam with varying lr if you haven't.
Otherwise, I don't see much problems. Maybe you need more data for the network to learn better.

CNN Model overfitting and not learning correctly from OpenCV sequences when learning games

I'm trying to get a Convolution Neural Network up and running to be able to play the old NES Ice Climbers. Right now I have utilized OpenCV to capture the screen for inputs and the output is the action of the iceclimber, such as walk left - right or jump. The problem I'm running into is the trained model doesn't actually learn properly or it's overfitting from when I validate it.
I've tried lowering the outputs by excluding the jump command. I've tried different batch sizes, epochs, and different test data. I've also tried changing the optimizer and dimensions but nothing had a significant impact.
Here is the code for when I'm capturing the screen and using that data to train my model. My training data is 900 sequential screen captures with the respective inputs pushed that I played. I have around 10k sequences saved from playing for the training data.
def screen_record():
global last_time
printscreen = np.array(ImageGrab.grab(bbox=(0,130,800,640)))
last_time = time.time()
processed = greycode(printscreen)
processed = cv2.resize(processed, (80, 60))
cv2.imshow('AIBOX', processed)
cv2.moveWindow("AIBOX", 500, 150);
#training.append([processed, check_input()])
processed = np.array(processed).reshape(-1, 80, 60, 1)
result = AI.predict(processed, batch_size=1)
print (result)
AI_Control_Access(result)
def greycode(screen):
greymap = cv2.cvtColor(screen, cv2.COLOR_BGR2GRAY)
greymap = cv2.Canny(greymap, threshold1=200, threshold2=300)
return greymap
def network_train():
train_data = np.load('ICE_Train5.npy')
train = train_data[::7]
test = train_data[-3::]
x_train = np.array([i[0] for i in train]).reshape(-1,80,60,1)
x_test = np.array([i[0] for i in test]).reshape(-1,80,60,1)
y_train = np.asarray([i[1] for i in train])
y_test = np.asarray([i[1] for i in test])
model = Sequential()
model.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(80, 60, 1)))
model.add(Convolution2D(16, (5, 5), activation='relu', strides=4))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer='sgd')
model.fit(x_train, y_train,batch_size=450,epochs=50,verbose=1,callbacks=None,validation_split=0,validation_data=None,shuffle=False,
class_weight=None,sample_weight=None,initial_epoch=0,steps_per_epoch=None,validation_steps=None)
When I run it against the test data for validation the highest I could get was around 16%, but even when I use that model for actually playing the game it always predicts the same button pushed so I think it's either due to over fitting of the model or the model not learning at all, but since this is my first time using a convolution network i'm unsure how to tweak the network to be more responsive to training.

The general setup sounds more like a good environment for reinforcement learning.
If you want to stick with the supervised learning setup, you should first check if your different classes have the same amount of training examples. If that's the case, you could experiment with the learning rate, more regularization (dropout), the network architecture etc.

Validation accuracy constant in Keras CNN for multiclass image classification

I'm performing a multiclass image classification task. While training my CNN the validation accuracy remains constant across all epochs. I've tried different model architectures and different hyperparameter values but no change. Any ideas would be greatly appreciated. Here are my current results:
Train and Validation Loss and Accuracy
Here is my CNN:
model = models.Sequential()
model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape= .
(img_width, img_height, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.2))
model.add(layers.Dense(64, activation = 'relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(8, activation = 'softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-08, decay=0.0001),metrics = ['acc'])
model.summary()

There are a variety of possible underlying factors that can potentially cause this phenomenon - below is a list, by no means exhaustive, of some preliminary fixes you could try:
If you're using the Adam optimizer(or any other adaptive learning rate optimizer such as RMSprop or Adadelta), try a significantly smaller initial learning rate than the default, somewhere on the order of 10E-6. Alternatively, try Stochastic Gradient Descent with an initial learning rate somewhere in the regime of 10E-2 to 10E-3. You could also set a large initial learning rate and anneal it over the course of several training epochs by employing Keras's LearningRateScheduler callback and defining a custom learning rate schedule(for SGD).
If the above doesn't work, try decreasing the complexity of your network (e.g. the number of layers) and increasing the size of the training set. Also, while you're inspecting your training dataset, ensure that your training set doesn't suffer from class imbalance - if it does, you can artificially weight the losses associated with the underrepresented class's training examples using the class_weights parameter that can be passed to the model's fit() method.
If the issue still persists, you may have to confront the possibility that a constant validation loss is possibly an artifact of essentially fitting on noise and any (even somewhat plausible) predictions the model emits may be spurious. You may want to try extracting more informative features, a larger variety of features or perform extensive data augmentation on your training set at this point.
Have a look at this GitHub issue for further suggestions that may help resolve your problem:
https://github.com/keras-team/keras/issues/1597

Training Keras autoencoder without bottleneck does not return original data

I'm trying to make an autoencoder using Keras with a tensorflow backend. In particular, I have data of a vector of n_components (i.e. 200) sampled n_times (i.e. 20000). It is key that when I train time t, that I compare it only to time t. It appears that it is shuffling the sampling times. I removed the bottleneck and find that the network is doing a pretty bad job of predicting the n_components, instead representing something more like the mean of the input scaled by each component.
Here is my network with the bottleneck commented out:
model = keras.models.Sequential()
# Make a 7-layer autoencoder network
model.add(keras.layers.Dense(n_components, activation='relu', input_shape=(n_components,)))
model.add(keras.layers.Dense(n_components, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
# model.add(keras.layers.Dense(3, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
# act is a numpy matrix of size (n_components, n_times)
model.fit(act.T, act.T, epochs=15, batch_size=100, shuffle=False)
newact = model.predict(act.T).T
I have tested shuffling the second component of act, n_times, and passing it as model.fit(act.T, act_shuffled.T) and see no difference from model.fit(act.T, act.T). Am I doing something wrong? How can I force it to learn from the specific time?
Many thanks,
Arthur

I believe that I have solved the problem, but more knowledgeable users of Keras might be able to correct me. I had tried many different values for the argument batch_size of fit, but I didn't try a value of 1. When I changed it to 1, it did a good job of reproducing the input data.
I believe that the batch size, even if shuffle is set to False, allows the autoencoder to train one input time against an unrelated input time.
So, I have ammended my code to:
model.fit(act.T, act.T, epochs=15, batch_size=1, shuffle=False)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to detect multiple objects using my classification network? - python

Related

How to output a 3D tensor from a neural network?

My CNN image recognition model produces fluctuating validation loss

CNN Model overfitting and not learning correctly from OpenCV sequences when learning games

Validation accuracy constant in Keras CNN for multiclass image classification

Training Keras autoencoder without bottleneck does not return original data

Categories

Resources