I'm new to machine learning and deep learning and I'm trying to classify texts from 5 categories using neural networks. For that, I made a dictionary in order to translate the words to indexes, finally getting an array with lists of indexes. Moreover I change the labels to integers. I also did the padding and that stuff. The problem is that when I fit the model the accuracy keeps quite low (~0.20) and does not change across the epochs. I have tried to change a lot of params, like the size of the vocabulary, number of neurones, dropout probability, optimizer parameter, etc. The key parts of the code are below.
# Arrays with indexes (that works fine)
X_train = tokens_to_indexes(tokenized_tr_mrp, vocab, return_vocab=False)
X_test, vocab_dict = tokens_to_indexes(tokenized_te_mrp, vocab)
# Labels to integers
labels_dict = {}
labels_dict['Alzheimer'] = 0
labels_dict['Bladder Cancer'] = 1
labels_dict['Breast Cancer'] = 2
labels_dict['Cervical Cancer'] = 3
labels_dict['Negative'] = 4
y_train = np.array([labels_dict[i] for i in y_tr])
y_test = np.array([labels_dict[i] for i in y_te])
# One-hot encoding of labels
from keras.utils import to_categorical
encoded_train = to_categorical(y_train)
encoded_test = to_categorical(y_test)
# Padding
max_review_length = 235
X_train_pad = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test_pad = sequence.pad_sequences(X_test, maxlen=max_review_length)
# Model
# Vocab size
top_words = len(list(vocab_dict.keys()))
# Neurone type
rnn = LSTM
# dropout
set_dropout = True
p = 0.2
# embedding size
embedding_vector_length = 64
# regularization strength
L = 0.0005
# Number of neurones
N = 50
# Model
model = Sequential()
# Embedding layer
model.add(Embedding(top_words,
embedding_vector_length,
embeddings_regularizer=regularizers.l1(l=L),
input_length=max_review_length
#,embeddings_constraint=UnitNorm(axis=1)
))
# Dropout layer
if set_dropout:
model.add(Dropout(p))
# Recurrent layer
model.add(rnn(N))
# Output layer
model.add(Dense(5, activation='softmax'))
# Compilation
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.001),
metrics=['Accuracy'])
# Split training set for validation
X_tr, X_va, y_tr_, y_va = train_test_split(X_train_pad, encoded_train,
test_size=0.3, random_state=2)
# Parameters
batch_size = 50
# N epochs
n_epocas = 20
best_val_acc = 0
best_val_loss = 1e20
best_i = 0
best_weights = []
acum_tr_acc = []
acum_tr_loss = []
acum_val_acc = []
acum_val_loss = []
# Training
for e in range(n_epocas):
h = model.fit(X_tr, y_tr_,
batch_size=batch_size,
validation_data=(X_va, y_va),
epochs=1, verbose=1)
acum_tr_acc = acum_tr_acc + h.history['accuracy']
acum_tr_loss = acum_tr_loss + h.history['loss']
val_acc = h.history['val_accuracy'][0]
val_loss = h.history['val_loss'][0]
acum_val_acc = acum_val_acc + [val_acc]
acum_val_loss = acum_val_loss + [val_loss]
# if val_acc > best_val_acc:
if val_loss < best_val_loss:
best_i = len(acum_val_acc)-1
best_val_acc = val_acc
best_val_loss = val_loss
best_weights = model.get_weights().copy()
if len(acum_tr_acc)>1 and (len(acum_tr_acc)+1) % 1 == 0:
if e>1:
clear_output()
The code you posted is really bad practice.
You can either train for n_epocas using your current method and add callbacks to get the best weights (ex ModelCheckpoint) or use tf.GradientTape but using model.fit() for one epoch at a time can lead to weird results, since your optimizer doesn't know which epoch it is at.
I suggest keeping your current code but training for n_epocas all in one go and report the results here (accuracy + loss).
Someone gave me the solution. I just had to change this line:
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.001),
metrics=['Accuracy'])
For this:
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.001),
metrics=['acc'])
I also changed the lines in the final loop relating to accuracy. The one-hot encoding was necessary as well.
Related
I'm trying to train a basic Graph Neural Network using the StellarGraph library, in particular starting from the example provided in [0].
The example works fine, but now I would like to repeat the same exercize removing the N-Fold Crossvalidation and providing specific training, validation and test sets. I'm trying to do so with the following code:
# One hot encoding
graph_training_set_labels_encoded = pd.get_dummies(graphs_training_set_labels, drop_first=True)
graph_validation_set_labels_encoded = pd.get_dummies(graphs_validation_set_labels, drop_first=True)
graphs = graphs_training_set + graphs_validation_set
# Graph generator preparation
generator = PaddedGraphGenerator(graphs=graphs)
train_gen = generator.flow([x for x in range(0, len(graphs_training_set))],
targets=graph_training_set_labels_encoded,
batch_size=batch_size)
valid_gen = generator.flow([x for x in range(len(graphs_training_set),
len(graphs_training_set) + len(graphs_validation_set))],
targets=graph_validation_set_labels_encoded,
batch_size=batch_size)
# Stopping criterium
es = EarlyStopping(monitor="val_loss",
min_delta=0,
patience=20,
restore_best_weights=True)
# Model definition
gc_model = GCNSupervisedGraphClassification(layer_sizes=[64, 64],
activations=["relu", "relu"],
generator=generator,
dropout=dropout_value)
x_inp, x_out = gc_model.in_out_tensors()
predictions = Dense(units=32, activation="relu")(x_out)
predictions = Dense(units=16, activation="relu")(predictions)
predictions = Dense(units=1, activation="sigmoid")(predictions)
# Creating Keras model and preparing it for training
model = Model(inputs=x_inp, outputs=predictions)
model.compile(optimizer=Adam(adam_value), loss=binary_crossentropy, metrics=["acc"])
# GNN Training
history = model.fit(train_gen, epochs=num_epochs, validation_data=valid_gen, verbose=0, callbacks=[es])
# Calculate performance on the validation data
test_metrics = model.evaluate(valid_gen, verbose=0)
valid_acc = test_metrics[model.metrics_names.index("acc")]
print(f"Test Accuracy model = {valid_acc}")
Where graphs_training_set and graphs_validation_set are lists of StellarDiGraphs.
I am able to run this piece of code, but it provides NaN as result. What could be the problem?
Since it is the first time I am using StellarGraph and in particular PaddedGraphGenerator. I think my mistake rely on the usage of that generator, but providing training set and validation set in different manner didn't produce better results.
Thank you in advance.
UPDATE Fixed I typo in the code, as pointed out here (thanks to george123).
[0] https://stellargraph.readthedocs.io/en/stable/demos/graph-classification/gcn-supervised-graph-classification.html
I found a solution digging in the StellarGraph documentation for PaddedGraphGenerator and GCN Neural Network Class GCNSupervisedGraphClassification. Furthermore, I have found a similar question on StellarGraph Issue Tracker which also points out to the solution.
# Graph generator preparation
generator = PaddedGraphGenerator(graphs=graphs)
train_gen = generator.flow([x for x in range(0, num_graphs_for_training)],
targets=training_graphs_labels,
batch_size=35)
valid_gen = generator.flow([x for x in range(num_graphs_for_training, num_graphs_for_training + num_graphs_for_validation)],
targets=validation_graphs_labels,
batch_size=35)
# Stopping criterium
es = EarlyStopping(monitor="val_loss",
min_delta=0.001,
patience=10,
restore_best_weights=True)
# Model definition
gc_model = GCNSupervisedGraphClassification(layer_sizes=[64, 64],
activations=["relu", "relu"],
generator=generator,
dropout=dropout_value)
x_inp, x_out = gc_model.in_out_tensors()
predictions = Dense(units=32, activation="relu")(x_out)
predictions = Dense(units=16, activation="relu")(predictions)
predictions = Dense(units=1, activation="sigmoid")(predictions)
# Let's create the Keras model and prepare it for training
model = Model(inputs=x_inp, outputs=predictions)
model.compile(optimizer=Adam(adam_value), loss=binary_crossentropy, metrics=["acc"])
# GNN Training
history = model.fit(train_gen, epochs=num_epochs, validation_data=valid_gen, verbose=1, callbacks=[es])
# Evaluate performance on the validation data
valid_metrics = model.evaluate(valid_gen, verbose=0)
valid_acc = valid_metrics[model.metrics_names.index("acc")]
# Define test set indices temporary vars
index_begin_test_set = num_graphs_for_training + num_graphs_for_validation
index_end_test_set = index_begin_test_set + num_graphs_for_testing
test_set_indices = [x for x in range(index_begin_test_set, index_end_test_set)]
# Evaluate performance on test set
generator_for_test_set = PaddedGraphGenerator(graphs=graphs)
test_gen = generator_for_test_set.flow(test_set_indices)
result = model.predict(test_gen)
I am trying to build a word level lstm model using keras and for that I need to create one hot encoding for words to feed in the model. I have around 32360 words around 130,000 lines. Each time I attempt to run my model I run into a memory error.
I believe the issue is the size of the dataset. I have been researching this for a couple of days now and it seems the solution is to either: create a generator where I do the one hot encoding and then load my data in batches or to reduce the number of lines I attempt to feed into the model. I cannot quite figure out the generator piece.
The error I get is:
MemoryError: Unable to allocate 143. GiB for an array with shape
(1184643, 32360) and data type int32
Is the generator the correct way to go? Is there any way to solve this otherwise? My code is below:
vocab_size = len(tokenizer.word_index)+1
seq = []
for item in corpus:
seq_list = tokenizer.texts_to_sequences([item])[0]
for i in range(1, len(seq_list)):
n_gram = seq_list[:i+1]
seq.append(n_gram)
max_seq_size = max([len(s) for s in seq])
seq = np.array(pad_sequences(seq, maxlen=max_seq_size, padding='pre'))
input_sequences, labels = seq[:,:-1], seq[:,-1]
one_hot_labels = to_categorical(labels, num_classes=vocab_size, dtype='int32')
n_units = 256
embedding_size = 100
text_in = Input(shape = (None,))
x = Embedding(vocab_size, embedding_size)(text_in)
x = LSTM(n_units)(x)
x = Dropout(0.2)(x)
text_out = Dense(vocab_size, activation = 'softmax')(x)
model = Model(text_in, text_out)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
filepath="new_model_weights/weights-{epoch:02d}.hdf5"
checkpoint = ModelCheckpoint(filepath,
monitor='accuracy',
verbose=1,
save_best_only=False,
save_weights_only=False,
mode='auto',
save_freq='epoch')
callbacks_list = [checkpoint]
epochs = 50
batch_size = 10
history = model.fit(input_sequences, one_hot_labels, epochs=epochs, batch_size=batch_size, callbacks=callbacks_list, verbose=1)
My model weights (I output them to weights_before.txt and weights_after.txt) are precisely the same before and after the training, i.e. the training doesn't change anything, there's no fitting happening.
My data look like this (I basically want the model to predict the sign of feature, result is 0 if feature is negative, 1 if positive):
,feature,zerosColumn,result
0,-5,0,0
1,5,0,1
2,-3,0,0
3,5,0,1
4,3,0,1
5,3,0,1
6,-3,0,0
...
Brief summary of my approach:
Load the data.
Split it column-wise to x (feature) and y (result), split these two row-wise to test and validation sets.
Transform these sets into TimeseriesGenerators (not necessary in this scenario but I want to get this setup working and I don't see any reason why it shouldn't).
Create and compile simple Sequential model with few Dense layers and softmax activation on its output layer, use binary_crossentropy as loss function.
Train the model... nothing happens!
Complete code follows:
import keras
import pandas as pd
import numpy as np
np.random.seed(570)
TIMESERIES_LENGTH = 1
TIMESERIES_SAMPLING_RATE = 1
TIMESERIES_BATCH_SIZE = 1024
TEST_SET_RATIO = 0.2 # the portion of total data to be used as test set
VALIDATION_SET_RATIO = 0.2 # the portion of total data to be used as validation set
RESULT_COLUMN_NAME = 'feature'
FEATURE_COLUMN_NAME = 'result'
def create_network(csv_path, save_model):
before_file = open("weights_before.txt", "w")
after_file = open("weights_after.txt", "w")
data = pd.read_csv(csv_path)
data[RESULT_COLUMN_NAME] = data[RESULT_COLUMN_NAME].shift(1)
data = data.dropna()
x = data.ix[:, 1:2]
y = data.ix[:, 3]
test_set_length = int(round(len(x) * TEST_SET_RATIO))
validation_set_length = int(round(len(x) * VALIDATION_SET_RATIO))
x_train_and_val = x[:-test_set_length]
y_train_and_val = y[:-test_set_length]
x_train = x_train_and_val[:-validation_set_length].values
y_train = y_train_and_val[:-validation_set_length].values
x_val = x_train_and_val[-validation_set_length:].values
y_val = y_train_and_val[-validation_set_length:].values
train_gen = keras.preprocessing.sequence.TimeseriesGenerator(
x_train,
y_train,
length=TIMESERIES_LENGTH,
sampling_rate=TIMESERIES_SAMPLING_RATE,
batch_size=TIMESERIES_BATCH_SIZE
)
val_gen = keras.preprocessing.sequence.TimeseriesGenerator(
x_val,
y_val,
length=TIMESERIES_LENGTH,
sampling_rate=TIMESERIES_SAMPLING_RATE,
batch_size=TIMESERIES_BATCH_SIZE
)
model = keras.models.Sequential()
model.add(keras.layers.Dense(10, activation='relu', input_shape=(TIMESERIES_LENGTH, 1)))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(10, activation='relu'))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(1, activation='softmax'))
for item in model.get_weights():
before_file.write("%s\n" % item)
model.compile(
loss=keras.losses.binary_crossentropy,
optimizer="adam",
metrics=[keras.metrics.binary_accuracy]
)
history = model.fit_generator(
train_gen,
epochs=10,
verbose=1,
validation_data=val_gen
)
for item in model.get_weights():
after_file.write("%s\n" % item)
before_file.close()
after_file.close()
create_network("data/sign_data.csv", False)
Do you have any ideas?
The problem is that you are using softmax as the activation function of last layer. Essentially, softmax normalizes its input to make the sum of the elements to be one. Therefore, if you use it on a layer with only one unit (i.e. Dense(1,...)), then it would always output 1. To fix this, change the activation function of last layer to sigmoid which outputs a value in the range (0,1).
I'm trying to perform a hyperparameter optimization on a neural net, but as soon as I try a larger number of hidden layers, my neural network will always predict the same output, so my list of (negative) losses looks like:
-0.789302627913455
-0.789302627913455
-0.789302627913455
-1
-0.789302627913455
-0.789302627913455
-1
-0.789302627913455
-0.789302627913455
-0.789302627913455
this is my neural net:
def nn(learningRate, layers, neurons, dropoutIn, dropoutHidden, miniBatch, activationFun, epoch):
x_data = []
y_data = []
x_data_train = []
y_data_train = []
x_data_test = []
y_data_test = []
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
x_data = np.loadtxt('../4nodes_demand_vector')
x_data = x_data - x_data.min()
x_data = x_data / x_data.max() * 2
x_data = x_data - 1
y_data = np.loadtxt('../4nodes_vlink_vector')
input_dim = x_data.shape[1]
output_dim = y_data.shape[1]
split_ratio = 0.75
number_of_samples = x_data.shape[0]
# train data
x_data_train = x_data[:int(number_of_samples*split_ratio), ]
y_data_train = y_data[:int(number_of_samples*split_ratio), ]
# test data
x_data_test = x_data[int(number_of_samples*split_ratio):, ]
y_data_test = y_data[int(number_of_samples*split_ratio):, ]
adam = Adam(lr=learningRate)
model = Sequential()
model.add(Dropout(dropoutIn, input_shape=(input_dim,)))
model.add(Dense(units=neurons, input_shape=(input_dim,), kernel_constraint=maxnorm(3)))
for i in range(layers-1):
model.add(Dropout(dropoutHidden))
model.add(Dense(units=neurons, activation=activationFun, kernel_constraint=maxnorm(3)))
model.add(Dense(units=output_dim, activation='sigmoid'))
model.compile(loss='mean_squared_error',optimizer=adam)
model.fit(x_data_train, y_data_train, batch_size=miniBatch, validation_split=0.1, epochs=epoch, verbose=2)
predict = model.predict(x_data_test)
round_predict = np.round(predict)
correct = np.sum(np.all(round_predict == y_data_test, axis=1))
number_of_test_data = x_data_test.shape[0]
loss = -1.0 + (correct / float(number_of_test_data))
print("Loss: ", loss)
return loss
The neural net is trained on (unfortunately) private data, with 12 input neurons and 12 output neurons and I have 43000 data samples.
The idea of setting kernel_constraint to maxnorm(3) came from http://jmlr.org/papers/v15/srivastava14a.html as I was running in several NaN problems.
I know this question is from 2 years ago, but...
My guess is that the problem is because you are using a final activation of 'sigmoid' (typical for classification) with a loss function of 'mean_squared_error' (a regression loss)
Based on the final loss calculation it looks like you are trying to do a binary classification. So maybe try changing the loss function to binary crossentropy.
I'm trying to build a class to quickly initialize and train an autoencoder for rapid prototyping. One thing I'd like to be able to do is quickly adjust the number of epochs I train for. However, it seems like no matter what I do, the model trains each layer for 100 epochs! I'm using the tensorflow backend.
Here is the code from the two offending methods.
def pretrain(self, X_train, nb_epoch = 10):
data = X_train
for ae in self.pretrains:
ae.fit(data, data, nb_epoch = nb_epoch)
ae.layers[0].output_reconstruction = False
ae.compile(optimizer='sgd', loss='mse')
data = ae.predict(data)
.........
def fine_train(self, X_train, nb_epoch):
weights = [ae.layers[0].get_weights() for ae in self.pretrains]
dims = self.dims
encoder = containers.Sequential()
decoder = containers.Sequential()
## add special input encoder
encoder.add(Dense(output_dim = dims[1], input_dim = dims[0],
weights = weights[0][0:2], activation = 'linear'))
## add the rest of the encoders
for i in range(1, len(dims) - 1):
encoder.add(Dense(output_dim = dims[i+1],
weights = weights[i][0:2], activation = self.act))
## add the decoders from the end
decoder.add(Dense(output_dim = dims[len(dims) - 2], input_dim = dims[len(dims) - 1],
weights = weights[len(dims) - 2][2:4], activation = self.act))
for i in range(len(dims) - 2, 1, -1):
decoder.add(Dense(output_dim = dims[i - 1],
weights = weights[i-1][2:4], activation = self.act))
## add the output layer decoder
decoder.add(Dense(output_dim = dims[0],
weights = weights[0][2:4], activation = 'linear'))
masterAE = AutoEncoder(encoder = encoder, decoder = decoder)
masterModel = models.Sequential()
masterModel.add(masterAE)
masterModel.compile(optimizer = 'sgd', loss = 'mse')
masterModel.fit(X_train, X_train, nb_epoch = nb_epoch)
self.model = masterModel
Any suggestions on how to fix the problem would be appreciated. My original suspicion was that it was something to do with tensorflow, so I tried running with the theano backend but encountered the same problem.
Here is a link to the full program.
Following the Keras doc, the fit method uses a default of 100 training epochs (nb_epoch=100):
fit(X, y, batch_size=128, nb_epoch=100, verbose=1, callbacks=[], validation_split=0.0, validation_data=None, shuffle=True, show_accuracy=False, class_weight=None, sample_weight=None)
I'm sure how you are running these methods, but following the "Typical usage" from the original code, you should be able to run something like (adjusting the variable num_epoch as required):
#Typical usage:
num_epoch = 10
ae = JPAutoEncoder(dims)
ae.pretrain(X_train, nb_epoch = num_epoch)
ae.train(X_train, nb_epoch = num_epoch)
ae.predict(X_val)