I am trying to build a regression model but the mse and mae are very high. I filter and normalize the data (both the input and output, and also the test and train set). I think the problem comes because I have very high values in one column: the minimum is 1 and the maximum is 9100000 (without normalizing), but I actually need to predict these high values.
The model looks like this: I have 6 input columns and 800000 rows. And I have tried with more neurons and layers, or changing the sigmoid function, but the loss and the error keep being around 0.8 for mse and 0.3 for mae. The predictions are also way lower than they should be, never achieving the high values.
model = Sequential()
model.add(Dense(7, input_dim=num_input, activation='relu'))
model.add(Dense(7, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse', optimizer='rmsprop', metrics=['mse', 'mae'])
history = model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, validation_data=(x_val, y_val))
A few remarks/advices:
RMSProp is generally not used with fully connected layers, I recommend switching to Adam or SGD.
If you have a skewed distribution with many large values, you might consider using the log of these values instead.
First try with a shallow model with few neurons. Then gradually increase the number of neurons in order to overfit the dataset. You should be able to reach perfect score on the train set. At that point you can start decrease the number of neurons and add layers with dropout to improve generalisation.
As already mentioned in the comments, the output activation for regression should be "linear". Sigmoid is for binary classification.
Related
Currently I'm trying to train a Keras Sequential Network with pooled output from BERT. The fine tuned BertForSequence Classification yields good results, but using the pooled_output in a Neural Network does not work as intented. As Input data I got 10.000 Values, each consisting of the 768 floats that my BERT-Model provides. I'm trying to do a simple binary classification, so I also got the labels with 1 and 0's.
As you can see my data has a good number of examples for both classes. After shuffling them, I do a normal train test split and create/fit my model with:
model = Sequential()
model.add(Dense(1536, input_shape=(768,), activation='relu'))
model.add(Dense(1536, activation='relu'))
model.add(Dense(1536, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
opt = Adam(learning_rate=0.0001)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
#Normally with early stopping so quite a few epochs
history = model.fit(train_features, train_labels, epochs=800, batch_size=68, verbose=1,
validation_split=0.2, callbacks=[])
During training the loss decreases and my accuracy increases as expected. BUT the val_loss increases and the val_accuracy stays the same! Sure I'm overfitting, but I would expect that the val_accuracy increases, at least for a few epochs and then decreaes when I'm overfitting.
Has anyone an Idea what I'm doing wrong? Perhaps 10.000 values aren't enough to generalize?
Model is over fitting as expected but am surprised it starts over fitting on the early epochs which makes me winder if you have some mislabeling in your validation set. At any rate try add changing the model as follows
model = Sequential()
model.add(Dense(1536, input_shape=(768,), activation='relu'))
model.add(Dropout(.3))
model.add(Dense(512, activation='relu'))
model.add(Dropout(.3))
model.add(Dense(128, activation='relu'))
model.add(Dropout(.3))
model.add(Dense(1, activation='sigmoid'))
See if this reduces the over fitting problem
It was not just a mislabeling in my validation set, but in my whole data.
I take a sample of 100000 entries
train_df = train_df.sample(frac=1).reset_index(drop=True)
train_df = train_df.iloc[0:100000]
and delete some values
train_df = train_df[train_df['label'] != '-']
after that i set a few values using train_df.at in a loop, but some indices don't exist because i deleted them. train_df.at only throws warnings so I did not see this. Also I mixed .loc and .iloc so in my case i selected .iloc[2:3] but the index 2 does not exist, so it return index 3 wich is on position 2. After that I make my changes and train_df.at fails at inserting on position 2, but my loop goes on. The next iteration .iloc returns index 4 on position 3. My loop then puts the data on index 3 - from now on all my labels are one position off.
Iam doing a text classification, my dataset size is 16000 KB, my problem is I have 95% of training and 90% in testing.. can I increase testing ? and how?
here is my code
model = Sequential()
model.add(Conv1D( filters=256,kernel_size=5, activation = 'relu',input_shape=(7,1)))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(Dense(11, activation='softmax'))
model.summary()
model.compile(Adam(lr=0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
epochs=200,
verbose=True,
validation_data=(X_test, y_test),
batch_size=128)
loss, accuracy = model.evaluate(X_train, y_train, verbose=True)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy: {:.4f}".format(accuracy))
The first step to debug the model is to plot the training validation curve like the example.
Typical training validation curve
Now based on how the curves behave there can be below possible inferences and solutions.
The two curves diverge as the model is trained, training keeps on improving while the testing either gets worse or saturates way earlier than training.
Cause: Model is overfitting the training and needs regularisation eg. dropout, weight decay, etc.
The two curves stick close together at the end and no further improvements happen.
Cause: Model is saturated or stuck in local minima, try increasing the learning rate to push out of minima, if still no major improvements, try adding more complexity to the model.
The two curves have saturated at the end, but are a small distance apart, and not major changes happed as further trained.
Cause: the model has learned what it could from the available data and will not improve any further, try data transformations to generate new data or get more data.
I am using tensorflow and keras for a binary classification problem.
I have only 121 samples, but 20.000 features. I know its too less samples and too many features, but its a biological problem (gene-expression data), so i have to deal with it.
My question: Why is accuracy (train and test) going up to 100%, then down and then increasing again. BUT loss is decreasing all the time?
Accuracy plot:
Validation plot:
Since my dataset is only 118 samples big i have only 24 test data points. See confusion matrix:
This is my neural network architecture:
with current settings:
{'ann__dropout_rate': 0.4, 'ann__learning_rate': 0.01, 'ann__n_neurons': 16, 'ann__num_hidden': 1, 'ann__regularization_rate': 0.6}
model = Sequential()
model.add(Dense(input_shape, activation="relu",
input_dim=input_shape)) # First Layer
model.add(Dense(n_neurons, activation="relu",
kernel_regularizer=tf.keras.regularizers.l1(regularization_rate)))
model.add(Dropout(dropout_rate))
model.add(Dense(1, activation="sigmoid"))
optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
model.compile(loss="binary_crossentropy",
optimizer=optimizer, metrics=['accuracy'])
return model
Thank you!
try to shuffle your training data if you are not doing so already. You might also try a larger batch size. I also recommend using the ReduceLROnPlateau callback in model.fit. Documentation is here. Set it up to monitor validation loss and to reduce the learning rate by a factor <1 if the loss fails to reduce after patience epochs.
I implemented your #Gerry P ideas (Shuffle=true) and ReduceLROnPlateau (batch size is 64). My callbacks are now:
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, min_lr=1e-6, verbose=1)
early_stop = EarlyStopping(monitor='val_loss', min_delta=0, patience=20, mode='auto')
My Accuracy accuracy and Loss loss looks now like this:
I would say its still overfitted.
Confusion-Matrix:
Confusionmatrix
I am using tensorflow with keras to perform regression on some historical data. Data type as follows:
id,timestamp,ratio
"santalucia","2018-07-04T16:55:59.020000",21.8
"santalucia","2018-07-04T16:50:58.043000",22.2
"santalucia","2018-07-04T16:45:56.912000",21.9
"santalucia","2018-07-04T16:40:56.572000",22.5
"santalucia","2018-07-04T16:35:56.133000",22.5
"santalucia","2018-07-04T16:30:55.767000",22.5
And I am reformulating it as a time series problem (25 time steps) so that I can predict (make a regression) for the next values of the series (variance should not be high). I am using also sklearn.preprocessing MinMaxScaler to scale the data to range (-1,1) or (0,1) depending if I use LSTM or Dense (respectively).
I am training with two different architectures:
Dense is as follows:
def get_model(self, layers, activation='relu'):
model = Sequential()
# Input arrays of shape (*, layers[1])
# Output = arrays of shape (*, layers[1] * 16)
model.add(Dense(units=int(64), input_shape=(layers[1],), activation=activation))
model.add(Dense(units=int(64), activation=activation))
# model.add(Dropout(0.2))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
# opt = optimizers.Adagrad(lr=self.learning_rate, epsilon=None, decay=self.decay_lr)
opt = optimizers.rmsprop(lr=0.001)
model.compile(optimizer=opt, loss=self.loss_fn, metrics=['mae'])
model.summary()
return model
Which more or less provides with good results (same architecture as in tensorflows' tutorial for predicting house prices).
However, LSTM is not giving good results, it usually ends up stuck around a value (for example, 40 (40.0123123, 40.123123,41.09090...) and I do not see why or how to improve it. Architecture is:
def get_model(self, layers, activation='tanh'):
model = Sequential()
# Shape = (Samples, Timesteps, Features)
model.add(LSTM(units=128, input_shape=(layers[1], layers[2]),
return_sequences=True, activation=activation))
model.add(LSTM(64, return_sequences=True, activation=activation))
model.add(LSTM(layers[2], return_sequences=False, activation=activation))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
opt = optimizers.Adagrad(lr=0.001, decay=self.decay_lr)
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])
model.summary()
return model
I currently train with a batch size of 200 that increases by a rate of 1.5 every fit. Each fit is made of 50 epochs, and I use a keras earlystopping callback with at least 20 epoch.
I have tried adding more layers, more units, reducing layers, units, increasing and decreasing learning rate, etc, but every time it gets stuck around a value. Any reason for this?
Also, do you know any good practices that can be applied to this problem?
Cheers
Have you tried holding back a validation set seeing how well the model performance on the training set tracks with the validation set? This is often how I catch myself overfitting.
A simple function for doing this (adapted from here) can help you do that:
hist = model.fit_generator(...)
def gen_graph(history, title):
plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])
plt.title(title)
gen_graph(hist, "Accuracy, training vs. validation scores")
Also, do you have enough samples? If you're really, really sure that you have done as much as you can in terms of preprocessing, and in terms of hyperparameter tuning... generating some synthetic data or doing some data augmentation has occasionally helped me.
I'm trying to make an autoencoder using Keras with a tensorflow backend. In particular, I have data of a vector of n_components (i.e. 200) sampled n_times (i.e. 20000). It is key that when I train time t, that I compare it only to time t. It appears that it is shuffling the sampling times. I removed the bottleneck and find that the network is doing a pretty bad job of predicting the n_components, instead representing something more like the mean of the input scaled by each component.
Here is my network with the bottleneck commented out:
model = keras.models.Sequential()
# Make a 7-layer autoencoder network
model.add(keras.layers.Dense(n_components, activation='relu', input_shape=(n_components,)))
model.add(keras.layers.Dense(n_components, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
# model.add(keras.layers.Dense(3, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
# act is a numpy matrix of size (n_components, n_times)
model.fit(act.T, act.T, epochs=15, batch_size=100, shuffle=False)
newact = model.predict(act.T).T
I have tested shuffling the second component of act, n_times, and passing it as model.fit(act.T, act_shuffled.T) and see no difference from model.fit(act.T, act.T). Am I doing something wrong? How can I force it to learn from the specific time?
Many thanks,
Arthur
I believe that I have solved the problem, but more knowledgeable users of Keras might be able to correct me. I had tried many different values for the argument batch_size of fit, but I didn't try a value of 1. When I changed it to 1, it did a good job of reproducing the input data.
I believe that the batch size, even if shuffle is set to False, allows the autoencoder to train one input time against an unrelated input time.
So, I have ammended my code to:
model.fit(act.T, act.T, epochs=15, batch_size=1, shuffle=False)