I want to get loss values as model train with each instance.
history = model.fit(..)
for example above code returns the loss values for each epoch not mini batch or instance.
what is the best way to do this? Any suggestions?
There is exactly what you are looking for at the end of this official keras documentation page https://keras.io/callbacks/#callback
Here is the code to create a custom callback
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
model = Sequential()
model.add(Dense(10, input_dim=784, kernel_initializer='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
history = LossHistory()
model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=0, callbacks=[history])
print(history.losses)
# outputs
'''
[0.66047596406559383, 0.3547245744908703, ..., 0.25953155204159617, 0.25901699725311789]
'''
If you want to get loss values for each batch, you might want to use call model.train_on_batch inside a generator. It's hard to provide a complete example without knowing your dataset, but you will have to break your dataset into batches and feed them one by one
def make_batches(...):
...
batches = make_batches(...)
batch_losses = [model.train_on_batch(x, y) for x, y in batches]
It's a bit more complicated with single instances. You can, of course, train on 1-sized batches, though it will most likely thrash your optimiser (by maximising gradient variance) and significantly degrade performance. Besides, since loss functions are evaluated outside of Python's domain, there is no direct way to hijack the computation without tinkering with C/C++ and CUDA sources. Even then, the backend itself evaluates the loss batch-wise (benefitting from highly vectorised matrix-operations), therefore you will severely degrade performance by forcing it to evaluate loss on each instance. Simply put, hacking the backend will only (probably) help you reduce GPU memory transfers (as compared to training on 1-sized batches from the Python interface). If you really want to get per-instance scores, I would recommend you to train on batches and evaluate on instances (this way you will avoid issues with high variance and reduce expensive gradient computations, since gradients are only estimated during training):
def make_batches(batchsize, x, y):
...
batchsize = n
batches = make_batches(n, ...)
batch_instances = [make_batches(1, x, y) for x, y in batches]
losses = [
(model.train_on_batch(x, y), [model.test_on_batch(*inst) for inst in instances])
for batch, instances in zip(batches, batch_instances)
]
One solution is to calculate the loss function between train expectations and predictions from train input. In the case of loss = mean_squared_error and three dimensional outputs (i.e. image width x height x channels):
model.fit(train_in,train_out,...)
pred = model.predict(train_in)
loss = np.add.reduce(np.square(test_out-pred),axis=(1,2,3)) # this computes the total squared error for each sample
loss = loss / ( pred.shape[1]*pred.shape[2]*pred.shape[3]) # this computes the mean over the sample entry
np.savetxt("loss.txt",loss) # This line saves the data to file
After combining resources from here and here I came up with the following code. Maybe it will help you. The idea is that you can override the Callbacks class from keras and then use the on_batch_end method to check the loss value from the logs that keras will supply automatically to that method.
Here is a working code of an NN with that particular function built in. Maybe you can start from here -
import numpy as np
import pandas as pd
import seaborn as sns
import os
import matplotlib.pyplot as plt
import time
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import Callback
# fix random seed for reproducibility
seed = 155
np.random.seed(seed)
# load pima indians dataset
# download directly from website
dataset = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data",
header=None).values
X_train, X_test, Y_train, Y_test = train_test_split(dataset[:,0:8], dataset[:,8], test_size=0.25, random_state=87)
class NBatchLogger(Callback):
def __init__(self,display=100):
'''
display: Number of batches to wait before outputting loss
'''
self.seen = 0
self.display = display
def on_batch_end(self,batch,logs={}):
self.seen += logs.get('size', 0)
if self.seen % self.display == 0:
print('\n{0}/{1} - Batch Loss: {2}'.format(self.seen,self.params['samples'],
logs.get('loss')))
out_batch = NBatchLogger(display=1000)
np.random.seed(seed)
my_first_nn = Sequential() # create model
my_first_nn.add(Dense(5, input_dim=8, activation='relu')) # hidden layer
my_first_nn.add(Dense(1, activation='sigmoid')) # output layer
my_first_nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
my_first_nn_fitted = my_first_nn.fit(X_train, Y_train, epochs=1000, verbose=0, batch_size=128,
callbacks=[out_batch], initial_epoch=0)
Please let me know if you wanted to have something like this.
Related
I'm trying to implement a small prototype of The GCN Model using the library Stellargraph. I've my StellarGraph graph object ready, I'm trying to solve a multi-class multi-label classification problem. This means I'm trying to predict more than one column (19 exactly) each column is encoded to either 0 or 1.
Here is what I've done:
from sklearn.model_selection import train_test_split
from stellargraph.mapper import FullBatchNodeGenerator
train_subjects, test_subjects = train_test_split(nodelist, test_size = .25)
generator = FullBatchNodeGenerator(graph, method="gcn")
from stellargraph.layer import GCN
train_gen = generator.flow(train_subjects['ID'], train_subjects.drop(['ID'], axis = 1))
gcn = GCN(layer_sizes=[16, 16], activations=["relu", "relu"], generator=generator, dropout=0.5)
from tensorflow.keras import layers, optimizers, losses, metrics, Model
x_inp, x_out = gcn.in_out_tensors()
predictions = layers.Dense(units = 1, activation="sigmoid")(x_out)
from tensorflow.keras.metrics import Precision as Precision
model = Model(inputs=x_inp, outputs=predictions)
model.compile(
optimizer=optimizers.Adam(learning_rate = 0.01),
loss=losses.categorical_crossentropy,
metrics= [Precision()])
val_gen = generator.flow(test_subjects['ID'], test_subjects.drop(['ID'], axis = 1))
from tensorflow.keras.callbacks import EarlyStopping
es_callback = EarlyStopping(monitor="val_precision", patience=200, restore_best_weights=True)
history = model.fit(
train_gen,
epochs=200,
validation_data=val_gen,
verbose=2,
shuffle=False,
callbacks=[es_callback])
I've 271045 edges & 16354 nodes in total including 12265 training nodes. The issue I'm getting is a shape mismatching from Keras. It states as follows. I suspect it's due to inserting multiple columns as target columns. I've tried the model using only one column (class) & it worked perfectly.
InvalidArgumentError: Incompatible shapes: [1,12265] vs. [1,233035]
[[node LogicalAnd_1 (defined at tmp/ipykernel_52/2745570431.py:7) ]] [Op:__inference_train_function_1405]
It's worth mentioning that 233035 = 12265 (number of train nodes) times 19 (number of classes). Any Idea on what is going wrong here?
I figured out the problem.
It was a newbie mistake, I initialized the Dense Classification layer with 1 unit instead of 19 (number of classes).
I just needed to fix that line to:
predictions = layers.Dense(units = 19, activation="sigmoid")(x_out)
Have a nice day!
I'm writing here, hoping to solve a problem, that i had with a neural network, developed in python by using keras.
I'm newer with the deep learning world, and I'm studying the theory and trying to implement some code.
Goal: develop a net that allows me to recognize 2 different words (commands) that i say [in the future them will be used to drive a small robot-car]. Actually, I want only to achieve the identification of "yes/no"
Implementation: i'm trying to implement a binary classification network.
Here is my code:
i used librosa to convert the audio training and test set in a matrix input with 193 features
to overcame the possibility of batch normalization problem, i scaled the data, by using the preprocessing package (I saw that effectively that improves and affects performance): i notice that if I don't normalize training, test and data_to_be_anlyzed by using the same normalization, it doesn't work
I read that keras accept as input that can be both numpy array, so i convert the target y into numpy
i proceed with the construction of the model, training and test (i know that i'm using methods that actually are deprecated)
I use one audio to perform another text, because for the future i assume that the net will receive (and judge) one audio at-a-time
import support.myutilities as utilNP
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from sklearn.preprocessing import StandardScaler
#READ AND BUILD INPUT
X_si = utilNP.np_array_features_dir('DIRECTORY PATH')
X_no = utilNP.np_array_features_dir('DIRECTORY PATH')
X_tot = np.concatenate((X_si, X_no), axis=0)
# Scale the train set
scaler = StandardScaler().fit(X_tot)
X_train = scaler.transform(X_tot)
#0=si 1=no
y=[]
for i in range(len(X_si)):
y.append(0)
for i in range(len(X_no)):
y.append(1)
Y=np.array(y)
#READ AND BUIL TEST TRAINING SET
X_si_test = utilNP.np_array_features_dir('DIRECTORY PATH')
X_no_test = utilNP.np_array_features_dir('DIRECTORY PATH')
X_tot_test = np.concatenate((X_si_test, X_no_test), axis=0)
# Scale the test set
scaler2 = StandardScaler().fit(X_tot_test)
X_test = scaler2.transform(X_tot_test)
y_test=[]
for i in range(len(X_si_test)):
y_test.append(0)
for i in range(len(X_no_test)):
y_test.append(1)
Y_test=np.array(y_test)
###### BUILD MODEL
model = Sequential()
model.add(Dense(100, input_dim=len(X_tot[0]), activation='relu')) #193 features as input
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='sigmoid')) #1 output
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
model.fit(X_train, Y, epochs=300, verbose=1)
#test
scores = model.evaluate(X_test, Y_test, verbose=0)
print('Accuracy on training data: {}% \n Error on training data: {}'.format(scores[1], 1 - scores[1]))
predictions = model.predict(X_test)
for i in range(len(predictions)):
print('=> %d (expected %d)' % (predictions[i], y_test[i]))
#TEST WITH A PRACTICAL NEW SOUND: supposed acquired
file_name = 'PATH AUDIO'
X = utilNP.np_array_features(file_name)
#Normalize according to input data
X_analyze = scaler2.transform(X)
y_analysis=[]
y_analysis.append(1) # i supposed that the audio is the one that return 1
pred_test= model.predict(X_analyze)
scores2 = model.evaluate(X_analyze, np.array(y_analysis), verbose=0)
print('Accuracy on test data: {}% \n Error on test data: {}'.format(scores2[1], 1 - scores2[1]))
Problems:
accuracy go in 100% in very few epochs. Is real that the training set is not so big (a total of 300 samples, and 40 for test), but this result is clearly wrong. By the way, if I use a number of epochs > 100, the net works well and performs its work good (practically the result of the single case study, is recognized)
if the number of epochs is low (20 for example), accuracy still reach 100% after few iterations, but the training is affect by the error in the results (why are not recognized?) and the final prediction wrong. It is not normal: i would expect a low accuracy to justify the wrong answer, but it remains at 100%
I test a lot of solution, passing from setting 'training=True/False', and read a lot of answer here and in stack exchange, but I don't solved nothing.
There is something wrong in my code?
Thanks in advance.
I don't understand why my keras model is under-estimating the target. I include the minimal example below. If I simplify the model architecture, the predictions are closer to the true one. But what I confuse is if the complex model overfits, why isn't the predictions be extremely close to the training true value but off systematically like that? (The plot is for training data)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from sklearn.metrics import mean_squared_error
def create_dataset(num_series=1, num_steps=1000, period=500, mu=1, sigma=0.3):
noise = np.random.normal(mu, sigma, size=(num_series, num_steps))
sin_minumPi_Pi = np.sin(np.tile(np.linspace(-np.pi, np.pi, period), int(num_steps / period)))
sin_Zero_2Pi = np.sin(np.tile(np.linspace(0, 2 * np.pi, period), int(num_steps / period)))
pattern = np.concatenate((np.tile(sin_minumPi_Pi.reshape(1, -1),
(int(np.ceil(num_series / 2)),1)),
np.tile(sin_Zero_2Pi.reshape(1, -1),
(int(np.floor(num_series / 2)), 1))
),
axis=0
)
target = noise + pattern
return target[0]
avail=create_dataset(mu=5)
window_size = 7
def getdata(data,window_size):
X,y = np.array([1]*window_size),np.array([])
for i in range(window_size, len(data)):
X = np.vstack((X,data[i-window_size:i]))
y = np.append(y,data[i:i+1])
return X[1:],y
X,y = getdata(avail,window_size)
def train_model(X,y,a_dim=100,epoch=50,batch_size=32,d=0.2):
model = Sequential()
model.add(Dense(a_dim,activation='relu',input_dim=X.shape[1]))
model.add(Dropout(d))
model.add(Dense(a_dim,activation='relu'))
model.add(Dropout(d))
model.add(Dense(a_dim,activation='relu'))
model.add(Dropout(d))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y, epochs=epoch,batch_size=batch_size, verbose=2)
return model
model = train_model(X,y)
plt.plot(model.predict(X)[:,0])
plt.plot(y)
plt.show()
It is because your model has been built for this behaviour.
For a model is the main purpose to get an optimized function which fits any unseen data as well as possible. Aming this, your model starts to learn from training samples. The problem is, that during the training process your model tends to overfit on your training samples loosing its generalization ability i.e. to perform well on unseen data.
To avoid this, we use some technics, one of them is dropout.
You used it as well:
model.add(Dropout(d))
with the default parameter d=0.2:
def train_model(X,y,a_dim=100,epoch=50,batch_size=32,d=0.2):
As I wrote above this is to avoid overfit on your training data and that is why your model constantly overestimate it, but get a better estimation to unseen data.
Passing other dropout value to your train_model() function you will get better fitting to your train data:
model = train_model(X, y, d=0.0)
plt.plot(model.predict(X)[:,0])
plt.plot(y)
plt.show()
Out:
I'm trying to understand how to implement neural networks. So I made my own dataset. Xtrain is numpy.random floats. Ytrain is sign(sin(1/x^3).
Try to implement neural networks gave me very poor results. 30%accuracy. Random Forest with 100 trees give 97%. But I heard that NN can approximate any function. What is wrong in my understanding?
import numpy as np
import keras
import math
from sklearn.ensemble import RandomForestClassifier as RF
train = np.random.rand(100000)
test = np.random.rand(100000)
def g(x):
if math.sin(2*3.14*x) > 0:
if math.cos(2*3.14*x) > 0:
return 0
else:
return 1
else:
if math.cos(2*3.14*x) > 0:
return 2
else:
return 3
def f(x):
x = (1/x) ** 3
res = [0, 0, 0, 0]
res[g(x)] = 1
return res
ytrain = np.array([f(x) for x in train])
ytest = np.array([f(x) for x in test])
train = np.array([[x] for x in train])
test = np.array([[x] for x in test])
from keras.models import Sequential
from keras.layers import Dense, Activation, Embedding, LSTM
model = Sequential()
model.add(Dense(100, input_dim=1))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(4))
model.add(Activation('softmax'))
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=['accuracy'])
P.S. I tried out many layers, activation functions, loss functions, optimizers, but never got more than 30% accuracy :(
I suspect that the 30% accuracy is a combination of small learning rate setting and a small training-step setting.
I ran your code snippet with model.fit(train, ytrain, nb_epoch=5, batch_size=32), after 5 epoch's training it yields about 28% accuracy. With the same setting but increasing the training steps to nb_epoch=50, the loss drops to ~1.157 ish and the accuracy raises to 40%. Further increase training steps should lead the model to further converging. Other than that, you can also try to configure the model with a larger learning rate setting which could make the converging faster :
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.1, momentum=0.9, nesterov=True), metrics=['accuracy'])
Although be careful don't set the learning rate to be too large otherwise your loss could blow up.
EDIT:
NN is known for having the potential for modeling extremely complex function, however, whether or not the model actually produce a good performance is a matter of how the model is designed, trained, and many other matters related to the specific application.
Zhongyu Kuang's answer is correct in stating that you may need to train it longer or with a different learning rate.
I'll add that the deeper your network, the longer you'll need to train it before it converges. For a relatively simple function like sign(sin(1/x^3)), you may be able to get away with a smaller network than the one you're using.
Additionally, softmax probably isn't the best output layer. You just need to yield -1 or 1. A single tanh unit seems like it would do well. softmax is generally used when you want to learn a probability distribution over a finite set. (You'll probably want to switch your error function from cross entropy to mean square error for similar reasons.)
Try a network with one sigmoidal hidden layer and an output layer with just one tanh unit. Then play around with the layer size and learning rate. Maybe add a second hidden layer if you can't get results with just one, but I wouldn't be surprised if it's unnecessary.
Addendum: In this approach, you'll replace f(x) with a direct calculation of the target function instead of the one-hot vector you're using currently.
Below is the simple example of multi-class classification task with
IRIS data.
import seaborn as sns
import numpy as np
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.regularizers import l2
from keras.utils import np_utils
#np.random.seed(1335)
# Prepare data
iris = sns.load_dataset("iris")
iris.head()
X = iris.values[:, 0:4]
y = iris.values[:, 4]
# Make test and train set
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)
################################
# Evaluate Keras Neural Network
################################
# Make ONE-HOT
def one_hot_encode_object_array(arr):
'''One hot encode a numpy array of objects (e.g. strings)'''
uniques, ids = np.unique(arr, return_inverse=True)
return np_utils.to_categorical(ids, len(uniques))
train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)
model = Sequential()
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
# Actual modelling
# If you increase the epoch the accuracy will increase until it drop at
# certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
# epoch 70
hist = model.fit(train_X, train_y_ohe, verbose=0, nb_epoch=100, batch_size=1)
score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
print("Test fraction correct (NN-Score) = {:.2f}".format(score))
print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))
My question is how do people usually decide the size of layers?
For example based on code above we have:
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dense(3, activation='sigmoid'))
Where first parameter of Dense is 16 and second is 3.
Why two layers uses two different values for Dense?
How do we choose what's the best value for Dense?
Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).
Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.
There are two basic methods:
Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.
Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.
Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.
There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.
Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.
Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.
You could do some proportion on your number of units value, something like:
# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,), activation = 'relu' ))
model.add(Dropout(0.5))
model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))
#Output layer
model.add(Dense(num_classes, activation = 'softmax'))
The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:
Iris Setosa
Iris Versicolour
Iris Virginica
num_classes = 3
However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.
My suggestion is to use EarlyStopping(). Then check the number of epochs and accuracy with test loss.
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
rlp = lrd = ReduceLROnPlateau(monitor = 'val_loss',patience = 2,verbose = 1,factor = 0.8, min_lr = 1e-6)
es = EarlyStopping(verbose=1, patience=2)
his = classifier.fit(X_train, y_train, epochs=500, batch_size = 128, validation_split=0.1, verbose = 1, callbacks=[lrd,es])