I need advice. I got a very poor result(10% accuracy) when building a CNN model with Keras when only using a subset of CIFAR10 dataset (only use 10000 data, 1000 per class). How can I increase the accuracy? I try to change/increase the epoch, but the result is still the same. Here is my CNN architecture :
cnn = models.Sequential()
cnn.add(layers.Conv2D(25, (3, 3), input_shape=(32, 32, 3)))
cnn.add(layers.MaxPooling2D((2, 2)))
cnn.add(layers.Activation('relu'))
cnn.add(layers.Conv2D(50, (3, 3)))
cnn.add(layers.MaxPooling2D((2, 2)))
cnn.add(layers.Activation('relu'))
cnn.add(layers.Conv2D(100, (3, 3)))
cnn.add(layers.MaxPooling2D((2, 2)))
cnn.add(layers.Activation('relu'))
cnn.add(layers.Flatten())
cnn.add(layers.Dense(100))
cnn.add(layers.Activation('relu'))
cnn.add(layers.Dense(10))
cnn.add(layers.Activation('softmax'))
compile and fit:
EPOCHS = 200
BATCH_SIZE = 10
LEARNING_RATE = 0.1
cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE),
loss='binary_crossentropy',
metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1)
mc = ModelCheckpoint(filepath=checkpoint_path, monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
history_cnn = cnn.fit(train_images, train_labels, epochs=EPOCHS, batch_size=BATCH_SIZE,
validation_data=(test_images, test_labels),callbacks=[es, mc],verbose=0)
The data i use is CIFAR10, but i only take 1000 images per class so total data is only 10000. I use normalization for preprocessing the data.
First of all, the problem is the loss. Your dataset is a multi-class problem, not a binary and not multi-label one
As stated here:
The classes are completely mutually exclusive. There is no overlap
between automobiles and trucks. "Automobile" includes sedans, SUVs,
things of that sort. "Truck" includes only big trucks. Neither
includes pickup trucks.
In this situation is suggested the use of the categorical crossentropy. Keep in mind that if your label are sparse (encoded with the number between 0 and 999) and not as one hot encoded vector ([0, 0, 0 ... 1, 0, 0]) you should use the sparse categorical crossentropy.
not sparse (labels encoded as vectors [0, 0, 1,....0])
cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE),
loss='categorical_crossentropy',
metrics=['accuracy'])
sparse (labels encoded as numbers in (0, ... 999))
cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Also, the learning rate is quite high (0.1). I'll suggest you to start with something lower (0.001) for example.
this post is also relevant for your problem
Edit: my bad, for the number of filters it is a commong approach having an increasing number of filters
Related
I've been playing with Numer.ai data, mostly as a way to improve my understanding of neural nets but I'm running into a problem that I can't seem to get past. No matter the configuration of my dense neural net, the output comes out in a tight range.
The input is 300 scaled feature columns (0 to 1) and the target is between 0 and 1 (values of 0, 0.25, 0.5, 0.75, and 1)
Here is my fully reproducible code:
import pandas as pd
# load data
training_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_training_data.csv.xz")
tournament_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_tournament_data.csv.xz")
feature_cols = training_data.columns[training_data.columns.str.startswith('feature')]
# select those columns out of the training dataset
X_train = training_data[feature_cols].to_numpy()
# select target variables
y_train = training_data.loc[:,'target'].to_numpy()
#same thing on validation data
val_data = tournament_data[tournament_data.data_type=='validation']
X_val = val_data[feature_cols]
y_val= val_data.loc[:,'target']
I've tried a number of different configurations in my neural network (different optimizers: adam and sgd, different learning rates 0.01 down to 0.0001, different neuron sizes, adding dropout: although, I didn't expect this to work because it seems to be a problem with bias, not variance, using linear, softmax, and sigmoid final layer activation functions: softmax produces negative values so that was an immediate non-starter, different batch sizes: as small as 16 and as large as 256, adding or removing batch normalization, shuffling the input data, and training for different numbers of epochs). Ultimately, the results are one of two things:
Predicted values are all the same number, usually somewhere in the 0.45 to 0.55 area
Predicted values are in a very narrow range, usually not more than 0.05 different. So the values are 0.45 to 0.55
I can't figure out what configuration changes I need to make to get this neural network to output predictions across a broader area of the 0 to 1 range.
from tensorflow.keras import models, layers
dropout_rate = 0.15
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1028, activation = 'relu', kernel_regularizer='l2'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',metrics=['mae', 'mse'])
history = model.fit(X_train, y_train,
validation_data=(X_val, y_val),
batch_size=64,
epochs=200,
verbose=1)
# Prediction output
predictions_df = model.predict(X_val)
predictions_df = predictions_df.reshape(len(predictions_df))
pred_max = predictions_df.max()
pred_min = predictions_df.min()
pred_range = pred_max - pred_min
print(pred_max, pred_min, pred_range)
# example output: 0.51895267 0.47968164 0.039271027
EDIT:
There is an impact on them when the following changes are made (tests run on batches size of 512, number of epochs 5, below results are only on training data) -
Loss set to mse instead of binary_crossentropy
Batch size 512 (for quick prototyping)
Epochs set to 5 (loss flattens after that)
Remove l2 regularization, and increase dropout
Set output activation -
With sigmoid -> Max:0.60, ​Min: 0.36
Without activation -> Max: 0.69, Min: 0.29
With relu -> Max: 0.73, Min: 0.10
Here is the code for testing purposes -
from tensorflow.keras import models, layers
dropout_rate = 0.50
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1024, activation = 'relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1, activation='relu'))
model.compile(optimizer='adam',
loss='mse',metrics=['mae'])
history = model.fit(X_train, y_train,
#validation_data=(X_val, y_val),
batch_size=512,
epochs=5,
verbose=1)
# Prediction output
predictions_df = model.predict(X_train)
predictions_df = predictions_df.reshape(len(predictions_df))
pred_max = predictions_df.max()
pred_min = predictions_df.min()
pred_range = pred_max - pred_min
print(pred_max, pred_min, pred_range)
0.73566914 0.1063129 0.62935627
Proposed solutions
You are trying to solve a regression problem of predicting an arbitrary value between 0 to 1 (values of 0, 0.25, 0.5, 0.75, and 1), but trying to solve it as a binary classification problem using a sigmoid activation and a binary_crossentropy loss.
What you may want to try is using mse and/or removing any output activation (or better, use relu as suggested by #desertnaut). You could simply be underfitting as suggested by #xdurch0. Try with and without the regularization as well.
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1028, activation = 'relu')
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1))
model.compile(optimizer='adam', loss='mse')
Check this table to help you with how to use losses and activations for different types of problem settings.
On a side note, the discrete nature of the values in your dependent variable, y, you can also consider reframing the problem as a multi-class single-label classification problem, if the downstream task allows it.
I am using the mnist dataset(digits), and would like to implement mean squared error loss function, however I have the following error:
ValueError: A target array with shape (60000, 1) was passed for an output of shape (None, 10) while using as loss mean_squared_error. This loss expects targets to have the same shape as the output.
this is my code:
Originally, I tried sparse_categorical_crossentropy
Code modified from: https://www.youtube.com/watch?v=wQ8BIBpya2k
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test,y_test) = mnist.load_data()
x_train = tf.keras.utils.normalize(x_train, axis = 1)
x_test = tf.keras.utils.normalize(x_test, axis = 1)
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(128, activation='sigmoid'),
tf.keras.layers.Dense(10, activation='sigmoid')
])
model.compile(optimizer='SGD',
loss='mean_squared_error',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=3)
How can I reshape my data so that it works with MSE?
I guess you missing something very important here. You are trying to use a metric used in regression (Mean-Squared-Error) for a classification task (predicting classes). These two objectives are different tasks in the machine learning world.
If like to try it anyway, just reshape your last layer to one output-neuron and ReLU-activation:
tf.keras.layers.Dense(1, activation='relu')
One output neuron and ReLU-activation since your label is just the (integer) numbers from 0 to 9. Sigmoid gives you continuous values between 0 and 1, so this won't bring you any success in this case.
Keep in mind your model doesn't do classification anymore, it will give you a continuous number between 0 and inf. So don't be surprised if you get e.g. 3.1415 as output if you feed an image of a 3 into your model. The model tries now to produce outputs as close as possible to the number in the label.
I have a data like this
there are 29 column ,out of which I have to predict winPlacePerc(extreme end of dataframe) which is between 1(high perc) to 0(low perc)
Out of 29 column 25 are numerical data 3 are ID(object) 1 is categorical
I dropped all the Id column(since they're all unique) and also encoded the categorical(matchType) data into one hot encoding
After doing all this I am left with 41 column(after one hot)
This is how i am creating data
X = df.drop(columns=['winPlacePerc'])
#creating a dataframe with only the target column
y = df[['winPlacePerc']]
Now my X have 40 column and this is my label data looks like
> y.head()
winPlacePerc
0 0.4444
1 0.6400
2 0.7755
3 0.1667
4 0.1875
I also happen to have very large amount of data like 400k data ,so for testing purpose I am training on fraction of that,doing that using sckit
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.997, random_state=32)
which gives almost 13k data for training
For model I'm using Keras sequential model
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dense, Dropout, Activation
from keras.layers.normalization import BatchNormalization
from keras import optimizers
n_cols = X_train.shape[1]
model = Sequential()
model.add(Dense(40, activation='relu', input_shape=(n_cols,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='Adam',
metrics=['accuracy'])
model.fit(X_train, y_train,
epochs=50,
validation_split=0.2,
batch_size=20)
Since my y-label data is between 0 & 1 ,I'm using sigmoid layer as my output layer
this is training & validation loss & accuracy plot
I also tried to convert label into binary using step function and binary cross entropy loss function
after that y-label data looks like
> y.head()
winPlacePerc
0 0
1 1
2 1
3 0
4 0
and changing loss function
model.compile(loss='binary_crossentropy',
optimizer='Adam',
metrics=['accuracy'])
this method was more worse than previous
as you can see its not learning after certain epoch,and this also happens even if I am taking all data rather than fraction of it
after this did not work I also used dropout and tried adding more layer,but nothing works here
Now my question ,what I am doing wrong here is it wrong layer or in data how can I improve upon this?
To clear things out - this is a Regression problem so using accuracy doesn't really makes sense, because you will never be able to predict the exact value of 0.23124.
First of all you certainly want to normalise your values (not the one hot encoded) before passing it to the network. Try using a StandardScaler as a start.
Second, I would recommend changing the activation function in the output layer - try with linear and as a loss mean_squared_error should be fine.
In order to validate you model "accuracy" plot the predicted together with the actual - this should give you a chance of validating the results visually. However, that being said your loss already looks quite decent.
Check this post, should give you a good grasp of what (activation & loss functions) and when to use.
from sklearn.preprocessing import StandardScaler
n_cols = X_train.shape[1]
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(n_cols,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error',
optimizer='Adam',
metrics=['mean_squared_error'])
model.fit(X_train, y_train,
epochs=50,
validation_split=0.2,
batch_size=20)
Normalize data
Add more depth to your network
Make the last layer linear
Accuracy is not a good metric for regression. Let's see an example
predictions: [0.9999999, 2.0000001, 3.000001]
ground Truth: [1, 2, 3]
Accuracy = No:of Correct / Total => 0 /3 = 0
Accuracy is 0, but the predictions are pretty close to the ground truth. On the other hand, MSE will be very low pointing that the deviation of the predictions from the ground truth is very less.
My model is experiencing wild and big fluctuations in the validation loss and does not converge.
I am doing an image recognition project with my three dogs i.e. classifying the dog in the image. Two dogs are very similar and the 3rd is very different. I took 10 minute videos of each dog, separately. Frames were extracted as images at each second. My dataset consists of about 1800 photos, 600 of each dog.
This block of code is responsible for augmenting and creating the data to feed the model.
randomize = np.arange(len(imArr)) # imArr is the numpy array of all the images
np.random.shuffle(randomize) # Shuffle the images and labels
imArr = imArr[randomize]
imLab= imLab[randomize] # imLab is the array of labels of the images
lab = to_categorical(imLab, 3)
gen = ImageDataGenerator(zoom_range = 0.2,horizontal_flip = True , vertical_flip = True,validation_split = 0.25)
train_gen = gen.flow(imArr,lab,batch_size = 64, subset = 'training')
test_gen = gen.flow(imArr,lab,batch_size =64,subset = 'validation')
This picture is the result of the model below.
model = Sequential()
model.add(Conv2D(16, (11, 11),strides = 1, input_shape=(imgSize,imgSize,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Conv2D(32, (5, 5),strides = 1))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides = 2))
model.add(BatchNormalization(axis=-1))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(BatchNormalization(axis=-1))
model.add(Dropout(0.3))
#Fully connected layer
model.add(Dense(256))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(3))
model.add(Activation('softmax'))
sgd = SGD(lr=0.004)
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])
batch_size = 64
epochs = 100
model.fit_generator(train_gen, steps_per_epoch=(len(train_gen)), epochs=epochs, validation_data=test_gen, validation_steps=len(test_gen),shuffle = True)
Things I have tried.
High/low Learning rate ( 0.01 -> 0.0001)
Increase Dropout to 0.5 in both Dense layers
Increase/Decrease size of both Dense Layers ( 128 min -> 4048 max)
Increased number of CNN layers
Introduced Momentum
Increased/Decreased Batch Size
Things I have not tried
I have not used any other loss or metric
I have not used any other optimiser.
Have not adjusted any parameters of the CNN layers
It seems that there is some form of randomness or too many parameters in my model. I am aware that it is currently overfitting, but that should not be the cause of the volatility(?).
I am not too worried about the performance of the model. I would like to achieve about a 70% accuracy. All I want to do now is to stabilise the validation accuracy and to converge.
Note:
At some epochs, the training loss is very low ( <0.1 ) but validation
loss is very high ( > 3 ).
The videos are taken on different backgrounds, but +- the same amount on each background for each dog.
Some images are a bit blurry.
Change the optimizer to Adam, definitely better. In your code you are using it but with default parameters, you are creating an SGD optimizer but in the compile line you introduce an Adam with no parameters. Play with the actual parameters of your optimizer.
I encourage you to take out the dropout first, see what is happening and the if you manage to overfit, start with low dropout and go up.
Also it might be due to some of your test samples are very hard to detect and thus increase the loss, maybe take out the shuffle in the validation set and watch for any peridiocities to try to find out if there are validation samples hard to detect.
Hope it helps!
I see you have tried a lot of different things. Few suggestions:
I see you use large filters in your Conv2D eg. 11x11 and 5x5. If your image dimensions are not very big, you should definitely go for lower filter dimensions like 3x3.
Try different optimizers, try Adam with varying lr if you haven't.
Otherwise, I don't see much problems. Maybe you need more data for the network to learn better.
I am a building a model in Keras with input data X of variable length (N_sample, 50, 128). Each sample has 50 time-steps and at each time step, I have 128 features. However, I have used zero-padding to generate the input X, because not all Samples have 50 time-steps.
There are two ways of padding zeros.
For each sample, I feed the true data, say (20,128) in the beginning and then the remaining (30,128), I pad zero.
I pad the first 30 rows with zero and add data to the last 20 rows.
I then use sample_weight to assign a zero weight to the padded time steps.
However, in these two settings, I get completely different AUC on the test set. What happens if zero padded samples are fed before or after the true data in an LSTM network with sample_weights? Is it due to the initialization of the hidden state in the LSTM?
How would I know, which is correct? Thank you.
My model is as below:
model = Sequential()
model.add(TimeDistributed(Dense(64, activation='sigmoid'), input_shape=(50, 128)))
model.add(LSTM(32, return_sequences=True))
model.add(TimeDistributed(Dense(8, activation='sigmoid')))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='rmsprop',sample_weight_mode='temporal', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=2, sample_weight=Sample_weight_train)