I am working on a multi-label image classification problem, using TensorFlow, Keras and Python 3.9.
I have built a dataset containing one .csv file with image names and their respective one-hot encoded labels, like so:
I also have an image folder with the associated image files. There are around 17,000 images, and each one can be classified with a total of 29 possible labels. The dataset is fairly well balanced. These labels refer to the visual components found in an image, for example, the following image belongs to classes [02, 23, 05].
2 - Human Beings
5 - Plants
23 - Arms, Armour
This method of image labelling is popular in Trademark Imaging and is known as Vienna Classification.
Now, my goal is to perform predictions on similar images. For this, I am fine-tuning a VGG19 network with a custom prediction layer defined as follows:
prediction_layer = tf.keras.layers.Dense(29, activation=tf.keras.activations.sigmoid)`
All images are properly resized to (224, 224, 3) and their RGB values scaled to [0, 1]. My network summary looks like this:
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_11 (InputLayer) [(None, 224, 224, 3)] 0
tf.__operators__.getitem_1 (None, 224, 224, 3) 0
(SlicingOpLambda)
tf.nn.bias_add_1 (TFOpLambd (None, 224, 224, 3) 0
a)
vgg19 (Functional) (None, 7, 7, 512) 20024384
global_average_pooling2d_3 (None, 512) 0
(GlobalAveragePooling2D)
dense_12 (Dense) (None, 29) 14877
=================================================================
Total params: 20,039,261
Trainable params: 14,877
Non-trainable params: 20,024,384
_________________________________________________________________
The problem I am facing is in regard to the actual training of the network. I am using Adam and the binary_crossentropy loss function which I believe is adequate for multi-label problems. However, after around 5 hours of training, I am fairly dissapointed with the accuracy it's achieving.
Epoch 10/10
239/239 [==============================] - 1480s 6s/step - loss: 0.1670 - accuracy: 0.1969 - val_loss: 0.1656 - val_accuracy: 0.1922
I am somewhat familiar with multi-class classification but this is my first attempt at solving a multi-label problem. Am I failing at any point before training, is VGG19 not ideal for this task, did I get my parameters wrong?
Multilabel problems are different in evaluation. Check out this answer. Low accuracy could mean nothing. Consider that a prediction for one sample is only correct if the entire vector of 29 elements is correct. This is hard to achieve. For your example that is:
[0,1,0,0,1,0,0,0,0...,1,0,0,0,0,0,0]
I recommend you to use the binary accuracy, f1-score hamming loss or coverage to evaluate your model, depending on what aspect of the prediction is most important in your context.
Related
I have just started with implementing a LSTM in Python with Tensorflow / Keras to test out an idea I had, however I am struggling to properly create a model. This post is mainly about a Value error that I often get (see the code at the bottom), but any and all help with creating a proper LSTM model for the problem below is greatly appreciated.
For each day, I want to predict which of a group of events will occur. The idea is that some events are recurring / always occur after a certain amount of time has passed, whereas other events occur only rarely or without any structure. A LSTM should be able to pick up on these recurring events, in order to predict their occurences for days in the future.
In order to display the events, I use a list with values 0 and 1 (non-occurence and occurence). So for example if I have the events ["Going to school", "Going to the gym" , "Buying a computer"] I have lists like [1, 0, 1], [1, 1, 0], [1, 0, 1], [1, 1, 0] etc. The idea is then that the LSTM will recognize that I go to school every day, the gym every other day and that buying a computer is very rare. So following the sequence of vectors, for the next day it should predict [1,0,0].
So far I have done the following:
Create x_train: a numpy.array with shape (305, 60, 193). Each entry of x_train contains 60 consecutive days, where day is represented by a vector of the same 193 events that can take place like described above.
Create y_train: a numpy.array with shape (305, 1, 193). Similar to x_train, but y_train only contains 1 day per entry.
x_train[0] consists of day 1,2,...,60 and y_train[0] contains day 61. x_train[1] then contains day 2,...,61 and y_train[1] contains day 62, etc. The idea is that the LSTM should learn to use data from the past 60 days, and that it can then iteratively start predicting/generating new vectors of event occurences for future days.
I am really struggling with how to create a simple implementation of a LSTM that can handle this. So far I think I have figured out the following:
I need to start with the below block of code, where N_INPUTS = 60 and N_FEATURES = 193. I am not sure what N_BLOCKS should be, or if the value it should take is strictly bound by some conditions. EDIT: According to https://zhuanlan.zhihu.com/p/58854907 it can be whatever I want
model = Sequential()
model.add(LSTM(N_BLOCKS, input_shape=(N_INPUTS, N_FEATURES)))
I should probably add a dense layer. If I want the output of my LSTM to be a vector with the 193 events, this should look as follows:
model.add(layers.Dense(193,activation = 'linear') #or some other activation function
I can also add a dropout layer to prevent overfitting, for example with model.add.layers.dropout(0.2) where the 0.2 is some rate at which things are set to 0.
I need to add a model.compile(loss = ..., optimizer = ...). I am not sure if the loss function (e.g. MSE or categorical_crosstentropy) and optimizer matter if I just want a working implementation.
I need to train my model, which I can achieve by using model.fit(x_train,y_train)
If all of the above works well, I can start to predict values for the next day using model.predict(the 60 days before the day I want to predict)
One of my attempts can be seen here:
print(x_train.shape)
print(y_train.shape)
model = keras.Sequential()
model.add(layers.LSTM(256, input_shape=(x_train.shape[1], x_train.shape[2])))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(y_train.shape[2], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
model.fit(x_train,y_train) #<- This line causes the ValueError
Output:
(305, 60, 193)
(305, 1, 193)
Model: "sequential_29"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_27 (LSTM) (None, 256) 460800
dense_9 (Dense) (None, 1) 257
=================================================================
Total params: 461,057
Trainable params: 461,057
Non-trainable params: 0
_________________________________________________________________
ValueError: Shapes (None, 1, 193) and (None, 193) are incompatible
Alternatively, I have tried replacing the line model.add(layers.Dense(y_train.shape[2], activation='softmax')) with model.add(layers.Dense(y_train.shape[1], activation='softmax')). This produces ValueError: Shapes (None, 1, 193) and (None, 1) are incompatible .
Are my ideas somewhat okay? How can I resolve this Value Error? Any help would be greatly appreciated.
EDIT: As suggested in the comments, changing the size of y_train did the trick.
print(x_train.shape)
print(y_train.shape)
model = keras.Sequential()
model.add(layers.LSTM(193, input_shape=(x_train.shape[1], x_train.shape[2]))) #De 193 mag ieder mogelijk getal zijn. zie: https://zhuanlan.zhihu.com/p/58854907
model.add(layers.Dropout(0.2))
model.add(layers.Dense(y_train.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
model.fit(x_train,y_train)
(305, 60, 193)
(305, 193)
Model: "sequential_40"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_38 (LSTM) (None, 193) 298764
dropout_17 (Dropout) (None, 193) 0
dense_16 (Dense) (None, 193) 37442
=================================================================
Total params: 336,206
Trainable params: 336,206
Non-trainable params: 0
_________________________________________________________________
10/10 [==============================] - 3s 89ms/step - loss: 595.5011
Now I am stuck on the fact that model.predict(x) requires x to be of the same size as x_train, and will output an array with the same size as y_train. I was hoping only one set of 60 days would be required to output the 61th day. Does anyone know how to achieve this?
The solution may be to have y_train of shape (305, 193) instead of (305, 1, 193) as you predict one day, this does not change the data, just its shape. You should then be able to train and predict.
With model.add(layers.Dense(y_train.shape[1], activation='softmax')) of course.
I have time series of P processes, each of varying length but all having 5 variables (dimensions). I am trying to predict the estimated lifetime of a test process. I am approaching this problem with a stateful LSTM in Keras. But I am not sure if my training process is correct.
I divide each sequence into batches of length 30. So each sequence is of the shape (s_i, 30, 5), where s_i is different for each of the P sequences (s_i = len(P_i)//30). I append all sequences into my training data which has the shape (N, 30, 5) where N = s_1 + s_2 + ... + s_p.
Model:
# design network
model = Sequential()
model.add(LSTM(32, batch_input_shape=(1, train_X[0].shape[1], train_X[0].shape[2]), stateful=True, return_sequences=True))
model.add(LSTM(16, return_sequences=False))
model.add(Dense(1, activation="linear"))
model.compile(loss='mse', optimizer=Adam(lr=0.0005), metrics=['mse'])
The model.summary() looks like
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (1, 30, 32) 4864
_________________________________________________________________
lstm_2 (LSTM) (1, 16) 3136
_________________________________________________________________
dense_1 (Dense) (1, 1) 17
=================================================================
Training loops:
for epoch in range(epochs):
mean_tr_acc = []
mean_tr_loss = []
for seq in range(train_X.shape[0]): #24
# train on whole sequence batch by batch
for batch in range(train_X[seq].shape[0]): #68
b_loss, b_acc = model.train_on_batch(np.expand_dims(train_X[seq][batch], axis=0), train_Y[seq][batch][-1])
mean_tr_acc.append(b_acc)
mean_tr_loss.append(b_loss)
#reset lstm internal states after training of each complete sequence
model.reset_states()
Edit:
The problem with the loss graph was I was dividing the values in my custom loss, making them too small. If I remove the division and plot the loss graph logarithmically, it looks alright.
New Problem:
Once the training is done, I am trying to predict. I show my model a 30 time-samples of a new process; so the input shape is same as the batch_input_shape during training i.e. (1, 30, 5). The prediction I am getting for different batches of the same sequence are all same.
I am almost sure I am doing something wrong in the training process. If anyone could help me out, would be grateful. Thanks.
Edit 2:
So the model predicts exactly same results only if it has been trained for more than 20 epochs. Otherwise the prediction values are very close but still a bit different. I guess this is due to some kind of overfitting. Help!!!
The loss for 25 epochs looks like this:
Usually when results are the same it's because your data isn't normalized. I suggest you center your data with mean=0 and std=1 with a simple normal transform (ie. (data - mean)/std ). Try transforming it like so before training and testing. Differences in how data is normalized between training and testing sets can also cause problems, which may be the cause of your discrepancy in train vs test loss. Always use the same normalization technique for all your data.
I am trying to classify the Kaggle 10k dog images to 120 breeds using Keras and ResNet50. Due to memory constraints at Kaggle (14gb ram) - I have to use the ImageDataGenerator that feeds the images to the model and also allows data augmentation - in real time.
The base convoluted ResNet50 model:
conv_base = ResNet50(weights='imagenet', include_top=False, input_shape=(224,224, 3))
My model:
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(120, activation='softmax'))
Making sure that only my last added layers are trainable - so the ResNet50 original weights will not be modified in the training process and compiling model:
conv_base.trainable = False
model.compile(optimizer=optimizers.Adam(), loss='categorical_crossentropy',metrics=['accuracy'])
Num trainable weights BEFORE freezing the conv base: 216
Num trainable weights AFTER freezing the conv base: 4
And the final model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
resnet50 (Model) (None, 1, 1, 2048) 23587712
_________________________________________________________________
flatten_1 (Flatten) (None, 2048) 0
_________________________________________________________________
dense_1 (Dense) (None, 256) 524544
_________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
_________________________________________________________________
dense_2 (Dense) (None, 120) 30840
=================================================================
Total params: 24,143,096
Trainable params: 555,384
Non-trainable params: 23,587,712
_________________________________________________________________
The train and validation directories have each, 120 sub directories - one for each dog breed. In these folders are images of dogs. Keras is supposed to use these directories to get the correct label for each image: so an image from a "beagle" sub dir is classified automatically by Keras - no need for one-hot-encoding or anything like that.
train_dir = '../input/dogs-separated/train_dir/train_dir/'
validation_dir = '../input/dogs-separated/validation_dir/validation_dir/'
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_dir,target_size=(224, 224),batch_size=20, shuffle=True)
validation_generator = test_datagen.flow_from_directory(
validation_dir,target_size=(224, 224),batch_size=20, shuffle=True)
Found 8185 images belonging to 120 classes.
Found 2037 images belonging to 120 classes.
Just to make sure these classes are right and in the right order I've compared their train_generator.class_indices and validation_generator.class_indices - and they are the same.
Train the model:
history = model.fit_generator(train_generator,
steps_per_epoch=8185 // 20,epochs=10,
validation_data=validation_generator,
validation_steps=2037 // 20)
Note in the charts below, that while training accuracy improves as expected - the validation sets quickly around 0.008 which is 1/120...RANDOM prediction ?!
I've also replaced the train with validation and vice versa - and got the same issue: training accuracy improving while the validation accuracy got stuck on approx 0.008 = 1/120.
Any thoughts would be appreciated.
I've played with the batch size and found batch_size = 120 (the number of directories in the train as well as the valid directories) to eliminate the above issue. Now I can happily employ data augmentation techniques without crashing my Kaggle kernel on memory issues. Still I wonder...
How is Keras ImageDataGenerator sampling images from a directory when in categorical classification mode - depth or breadth wise ?
If depth wise - than with a batch size of 20 it will go thru the FIRST directory of say 100 photos (five times), and then move to the next directory, do it in batches of 20, move to the next dir...
Or is it breadthwise ?
Breadthwise - the initial batch of 20 will be ONE photo from each of the first 20 directories and then the next 20 directories ?
I couldn't find in the documentation how is Keras ImageDataGenerator working with batches when used with flow_from _directory and fit_generator.
I am using Keras with TensorFlow backend. I want to use a pretrained U-Net model and replace the input layer with another one.
I trained the model on images of size (256,256). When I am predicting a bigger scene I want to manipulate the input such that the UNet does what it does, just on another image size such that I don't have to cute the image or anything. Here is my code:
model = load(model_path)
model.layers.pop(0)
new_input = Input(shape = (512,512))
model = Model(new_input,model(new_input_layer))
Now when I am using
print(model.summary())
it outputs
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 512, 512, 3) 0
_________________________________________________________________
model_1 (Model) multiple 211825
=================================================================
and if I am doing
model.layers[1].summary()
I get
____________________________________________________________________________
Layer (type) Output Shape Param # Connectedto
=========================================================================
conv2d_1 (Conv2D) (None, 256, 256, 16) 1216 input_1[0][0]
__________________________________________________________________________
batch_normalization_1 (None, 256, 256, 16) 64 conv2d_1[0][0]
and so on. Typically the output shape of conv_2d_1 should be (None,512,512,16), but it wasn't updated properly (and for other layers as well). Further, when I am using
model.layers[1].layers[0].output_shape
I get the same result as in the summary.
When I am doing a prediction with the adapted model everything works fine with respect to the output. But if the image size is bigger than (512,512), for example like (4096,4096) I'm running into memory problems/ allocation problems with respect to gpu.
Therefore I want to calculate the memory that is needed, to predict the image and cut it, if it is too big. But to write a function that does this for me I need the correct informations about the outputs shapes.
Does anyone have any suggestions? Maybe I should replace the input layer in another way?! Maybe I can update the model somehow?
Or does some keras function already exist, which calculates the needed memory?(I didn't found any) Thanks for your attention! :)
There is a better alternative to what you are trying to achieve. The CNN layers can handle arbitrary shape if you specify shape=(None,None,3) says some height and width with 3 channels. You can train the original model like that and don't have to adjust tensor / image shapes when you predict.
I am trying to perform regression using a neural network to predict a single output from 146 input features.
I applied Standard Scaling on all inputs and output.
I monitor the Mean Absolute Error after training and it is unreasonably high on the train, validation and test sets (I am not even overfitting).
I suspect this is due to the fact that the output variable is very imbalanced (see histogram).
From the histogram it is possible to see that most of the samples are grouped around 0 but there is also another small group of samples around -5.
Histogram of the imbalanced output
This is model creation code:
input = Input(batch_shape=(None, X.shape[1]))
layer1 = Dense(20, activation='relu')(input)
layer1 = Dropout(0.3)( layer1)
layer1 = BatchNormalization()(layer1)
layer2 = Dense(5, activation='relu',
kernel_regularizer='l2')(layer1)
layer2 = Dropout(0.3)(layer2)
layer2 = BatchNormalization()(layer2)
out_layer = Dense(1, activation='linear')(layer2)
model = Model(inputs=input, outputs=out_layer)
model.compile(loss='mean_squared_error', optimizer=optimizers.adam()
, metrics=['mae'])
This is the model summary:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 146) 0
_________________________________________________________________
dense_1 (Dense) (None, 20) 2940
_________________________________________________________________
dropout_1 (Dropout) (None, 20) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 20) 80
_________________________________________________________________
dense_2 (Dense) (None, 5) 105
_________________________________________________________________
dropout_2 (Dropout) (None, 5) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 5) 20
_________________________________________________________________
dense_3 (Dense) (None, 1) 6
=================================================================
Total params: 3,151
Trainable params: 3,101
Non-trainable params: 50
_________________________________________________________________
Looking at the actual model predictions, the large error mainly happens for samples with a true output value around -5 (the small group of samples).
I tried many configurations for the hyperparameters but still the error is very high.
I see many suggestions on performing neural network classification on imbalanced data but what could be done with regression?
It seems odd to me that a regression neural network is not learning this correctly. What am I doing wrong?
From your histogram, it looks as though it's rare for there to be a non-zero output. This is similar to a classification problem where we're trying to predict a rare class, in that a strong strategy in terms of the loss function is simply to guess the most common class - in this case your modal value of zero.
You should do some research around what people do to predict rare events or to classify inputs when some classes are rare. E.g. this discussion might be helpful: https://www.reddit.com/r/MachineLearning/comments/412wpp/predicting_rare_events_how_to_prevent_machine/
Some strategies you might try include
Removing most of the zero-output training examples so that your training data is more balanced
Creating or acquiring more non-zero training examples
Using a different machine learning algorithm (someone at the link I provided recommends boosting. I wonder if you'd get good results from using a residual neural network structure, which is in some ways similar to boosting)
Re-structuring or rescaling your data to add more weight to the rare values
It appears to me that you have a normal distribution with a very small standard deviation. In which case this should train just as well as any other probability distribution.