Unable to train neural network properly - python
I am trying to train a Neural Network(NN) implemented through Keras to implement the following function.
y(n) = y(n-1)*0.9 + x(n)*0.1
So the idea is to have a signal as train_x data and pass through the above function to get a train_y data, giving us a (train_x, train_y) training data.
import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
train_x = np.concatenate((np.ones(100)*120,np.ones(150)*150,np.ones(150)*90,np.ones(100)*110), axis=None)
train_y = np.ones(train_x.size)*train_x[0]
alpha = 0.9
for i in range(train_x.size):
train_y[i] = train_y[i-1]*alpha + train_x[i]*(1 - alpha)
train_x data vs train_y data plot
The function under question y(n) is a low pass function and makes the x(n) value to not change abruptly, as shown in the plot.
Then I make a NN and fit it with (train_x, train_y) and plot the
model = Sequential()
model.add(Dense(128, kernel_initializer='normal', input_dim=1, activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='linear'))
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(train_x, train_y, epochs=200, verbose=0)
print(history.history['loss'][-1])
plt.plot(history.history['loss'])
plt.show()
loss_plot_200_epoch
And the final loss value is approximately 2.9, which I thought was pretty good. But then the accuracy plot was like this
accuracy_plot_200_epochs
So when I check the prediction of the neural network over the data it was trained on
plt.plot(model.predict(train_x))
plt.plot(train_x)
plt.show()
train_x_vs_predict_x
The values have just offsetted by a little and that's all. I tried changing the activation functions, number of neurons and layers but the result still is the same. What am I doing wrong?
---- Edit ----
Made the NN to accept 2 dimensional input and it works as intended
import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
train_x = np.concatenate((np.ones(100)*120,np.ones(150)*150,np.ones(150)*90,np.ones(100)*110), axis=None)
train_y = np.ones(train_x.size)*train_x[0]
alpha = 0.9
for i in range(train_x.size):
train_y[i] = train_y[i-1]*alpha + train_x[i]*(1 - alpha)
train = np.empty((500,2))
for i in range(500):
train[i][0]=train_x[i]
train[i][1]=train_y[i]
model = Sequential()
model.add(Dense(128, kernel_initializer='normal', input_dim=2, activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='linear'))
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(train, train_y, epochs=100, verbose=0)
print(history.history['loss'][-1])
plt.plot(history.history['loss'])
plt.show()
If I execute your code, I get the following plot for the X-Y values:
If I didn't miss something important here and you really feed that to your neural net, you probably can't expect better results. The reason is, that a neural net is just a function that can only calculate one output vector for one input. In your case the output vector would consist of only one element (your y value), but as you can see in the diagram above, for x=90 there is not just one single output. So what you feed to your neural net, cannot really be calculated as a function and so most likely the network tries to calculate the straight line between point ~(90, 145) and ~(150, 150). I mean, the "upper line" in the diagram.
The neural network you're building is a simple multi-layer perceptron with one input node and one output node. This means that it is essentially a function that accepts one real number and returns one real number -- context is not passed in and can therefore not be considered. The expression
model.predict(train_x)
Does not evaluate a vector-to-vector function for the vector train_x but evaluates a number-to-number function for every number in train_x, then returns the list of results. This is why you get flat segments in the train_x_vs_predict_x plot: the same input numbers produce the same output numbers every time.
Given this constraint, the approximation is actually quite good. For example, the network has seen for x values of 150 many y values of 150 and a few lower ones but never anything above 150. So, given an x of 150, it predicts a y value of slightly lower than 150.
The function you wanted, on the other hand, refers to the previous function value and will need information about this in its input. If what you're trying to build is a function that accepts a sequence of real numbers and returns a sequence of real numbers, you could do that with a many-to-many recurrent network (and you're going to need a lot more training data than one example sequence), but since you can calculate the function directly, why bother with neural networks at all? There's no need to whip out the chainsaw where a butter knife will do.
Related
Prediction Interval for Neural Net in Python
I'm currently using keras to create a neural net in python. I have a basic model and the code looks like this: from keras.layers import Dense from keras.models import Sequential model = Sequential() model.add(Dense(23, input_dim=23, kernel_initializer='normal', activation='relu')) model.add(Dense(500, kernel_initializer='normal', activation='relu')) model.add(Dense(1, kernel_initializer='normal', activation="relu")) model.compile(loss='mean_squared_error', optimizer='adam') It works well and gives me good predictions for my use case. However, I would like to be able to use a Variational Gaussian Process layer to give me an estimate for the prediction interval as well. I'm new to this type of layer and am struggling a bit to implement it. The tensorflow documentation on it can be found here: https://www.tensorflow.org/probability/api_docs/python/tfp/layers/VariationalGaussianProcess However, I'm not seeing that same layer in the keras library. For further reference, I'm trying to do something similar to what was done in this article: https://blog.tensorflow.org/2019/03/regression-with-probabilistic-layers-in.html There seems to be a bit more complexity when you have 23 inputs vs one that I'm not understanding. I'm also open to other methods to achieving the target objective. Any examples on how to do this or insights on other approaches would be greatly appreciated!
tensorflow_probability is a separate library but suitable to use with Keras and TensorFlow. You can add those custom layers in your code and change it to a probabilistic model. If your goal is just to get a prediction interval it would be simpler to use the DistributionLambda layer. So your code would be as follows: from keras.layers import Dense from keras.models import Sequential from sklearn.datasets import make_regression import tensorflow_probability as tfp import tensorflow as tf tfd = tfp.distributions # Sample data X, y = make_regression(n_samples=100, n_features=23, noise=4.0, bias=15) # loss function Negative log likelyhood negloglik = lambda y, p_y: -p_y.log_prob(y) # Model model = Sequential() model.add(Dense(23, input_dim=23, kernel_initializer='normal', activation='relu')) model.add(Dense(500, kernel_initializer='normal', activation='relu')) model.add(Dense(2)) model.add(tfp.layers.DistributionLambda( lambda t: tfd.Normal(loc=t[..., :1], scale=1e-3 + tf.math.softplus(0.05 * t[..., 1:])))) model.compile(loss=negloglik, optimizer='adam') model.fit(X,y, epochs=250, verbose=None) After training your model, you can get your prediction distribution with the following lines: yhat = model(X) # make predictions means = yhat.mean() # prediction means stds = yhat.stddev() # prediction standard deviation
how to improve f1 score for a imbalanced multiclass classification problem, tried using smote but it is giving bad results?
Dataset: train.csv Approach I have four classes to be predicted and they are really very imbalanced so i tried using SMOTE and a feed forward network but using smote is giving very poor results as compared to original dataset on the test data model architecture #model architecture from tensorflow.keras.layers import Dense, BatchNormalization, Dropout, Flatten model = tf.keras.Sequential() model.add(Dense(512, activation='relu', input_shape=(7, ))) model.add(BatchNormalization()) model.add(Dense(256, activation='relu')) model.add(BatchNormalization()) model.add(Dense(128, activation='relu')) model.add(BatchNormalization()) model.add(Dense(64, activation='relu')) model.add(Dense(4, activation='softmax')) earlystopping = tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=40, mode="auto", restore_best_weights=True, ) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.summary() So how to approach for this problem and increase the f1-score on the test dataset Any help is appreciated
Below is an explanation of what could be the best approach for your case. SMOTE Usually SMOTE balances out the data by random upsampling, so even if you have a data sample distribution like Class A having 15000 Records and Class B having 200 records it would upsample the Class B to 15000 Records too. Having too many random samples generated from the 200 Records it self sometimes makes the model very hard to learn and differentiate between classes, since the upsampling has significantly increased Class B records from 200 to 15000 by duplicating it. Possible Solutions Instead of SMOTE I would recommend to try Stratified Sampling between the train/test and then try building the model on top of it. Having class weights as parameter is another best approach and its present almost for all ML algorithms. In your case for Keras you can Refer Here it could be very helpful.
Getting a prediction from Keras with a 1-D array
I have the following code: from numpy import loadtxt import numpy as np from keras.models import Sequential from keras.layers import Dense from time import sleep dataset = loadtxt('dataset.csv', delimiter=',') X = dataset[:,0:8] y = dataset[:,8] model = Sequential() model.add(Dense(192, input_dim=8, activation='relu')) model.add(Dense(128, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=600, batch_size=10) _, accuracy = model.evaluate(X, y) print('Accuracy: %.2f' % (accuracy*100)) When I run it, it trains no problem, and most of the time I even get 100% accuracy, but I'm having trouble getting predictions from the model. As you can see by the following sample of the training data, the first 8 entries are inputs, and a 1 or 0 is the out put. 6,148,72,35,0,33.6,0.627,50,1 1,85,66,29,0,26.6,0.351,31,0 8,183,64,0,0,23.3,0.672,32,1 1,89,66,23,94,28.1,0.167,21,0 0,137,40,35,168,43.1,2.288,33,1 5,116,74,0,0,25.6,0.201,30,0 3,78,50,32,88,31.0,0.248,26,1 10,115,0,0,0,35.3,0.134,29,0 2,197,70,45,543,30.5,0.158,53,1 8,125,96,0,0,0.0,0.232,54,1 4,110,92,0,0,37.6,0.191,30,0 10,168,74,0,0,38.0,0.537,34,1 10,139,80,0,0,27.1,1.441,57,0 1,189,60,23,846,30.1,0.398,59,1 5,166,72,19,175,25.8,0.587,51,1 7,100,0,0,0,30.0,0.484,32,1 0,118,84,47,230,45.8,0.551,31,1 7,107,74,0,0,29.6,0.254,31,1 What I want to enter "6,148,72,35,0,33.6,0.627,50" into the code and have the model give me an output based on that. What should I do?
Alright, that was fast, but it occurred to me that I just needed to add another list define around the one that was already there, making it a 2D array, and allowing Keras to make a prediction.
Why am I getting horizontal line (almost zero) from neural network instead of the desired curve?
I am trying to use neural network for my regression problem in python but the output of the neural network is a straight horizontal line which is zero. I have one input and obviously one output. Here is my code: def baseline_model(): # create model model = Sequential() model.add(Dense(1, input_dim=1, kernel_initializer='normal', activation='relu')) model.add(Dense(4, kernel_initializer='normal', activation='relu')) model.add(Dense(1, kernel_initializer='normal')) # Compile model model.compile(loss='mean_squared_error',metrics=['mse'], optimizer='adam') model.summary() return model # evaluate model estimator = KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=64,validation_split = 0.2, verbose=1) kfold = KFold(n_splits=10) results = cross_val_score(estimator, X_train, y_train, cv=kfold) Here are the plots of NN prediction vs. target for both training and test data. Training Data Test Data I have also tried different weight initializers (Xavier and He) with no luck! I really appreciate your help
First of all correct your syntax while adding dense layers in model remove the double equal == with single equal = with kernal_initilizer like below model.add(Dense(1, input_dim=1, kernel_initializer ='normal', activation='relu')) Then to make the performance better do the followong Increase the number of hidden neurons in the hidden layers Increase the number of hidden layers. If still you have same problem then try to change the optimizer and activation function. Tuning the hyperparameters may help you in converging to the solution EDIT 1 You also have to fit the estimator after cross validation like below estimator.fit(X_train, y_train) and then you can test on the test data as follow prediction = estimator.predict(X_test) from sklearn.metrics import accuracy_score accuracy_score(Y_test, prediction)
Why do I fail to predict a linear equation (Y=2*x) with Keras?
I tried to predict a linear equation (Y=2*x) with Keras, but there it failed. With a sigmoid activation function I get rectangular predictions, with ReLu I get NaNĀ“s. What is the cause? How could I change the code to predict y=2*x. import numpy as np from keras.layers import Dense, Activation from keras.models import Sequential import matplotlib.pyplot as plt import math import time x = np.arange(-100, 100, 0.5) y = x*2 model = Sequential() model.add(Dense(10, input_shape=(1,))) model.add(Activation('sigmoid')) model.add(Dense(20) ) model.add(Activation('sigmoid')) model.add(Dense(1)) model.compile(loss='mean_squared_error', optimizer='SGD', metrics=['mean_squared_error']) t1 = time.clock() for i in range(40): model.fit(x, y, epochs=1000, batch_size=len(x), verbose=0) predictions = model.predict(x) print (i," ", np.mean(np.square(predictions - y))," t: ", time.clock()-t1) plt.hold(False) plt.plot(x, y, 'b', x, predictions, 'r--') plt.hold(True) plt.ylabel('Y / Predicted Value') plt.xlabel('X Value') plt.title([str(i)," Loss: ",np.mean(np.square(predictions - y))," t: ", str(time.clock()-t1)]) plt.pause(0.001) #plt.savefig("fig2.png") plt.show()
Although it seems that the default learning rate may be inappropriate at the first glance, the real problem here is that sigmoid activation is inappropriate. Why? Because your desired output should NOT be bounded, but using sigmoid implies a bounded output. To be more precise, your last layer computes an output y as y=\sum_i{w_i*x_i} + b while x_i here is the output of the second last layer, which is activated by sigmoid, indicating that x_i \in [0,1]. Because of this reason, your output y is bounded as y \in [-V+b,+V+b], where V=|w_0|+|w_1|+...+|w_19|, also known as the L1norm of the weight matrix, i.e. V=L1norm(W). Since the weight matrix W will be learned based on your training data, it is safe to conclude that your model will NOT be generalizable to those testing data, whose value is outside of the range ( min(x_train), max(x_train) ). How to fix? Thought 1: for this simple problem, you actually don't need any nonlinearity. Simply use a linear MLP as follows. model = Sequential() model.add(Dense(1, input_shape=(1,))) model.compile(loss='mse', optimizer='adam') I tested it, and it should converge in 200 epochs with an MSE around 1e-5. Thought 2: use a different activation function that does not suffer the bounded output issue, e.g. relu (note: tanh is also inappropriate for the same reason). model = Sequential() model.add(Dense(10, input_shape=(1,))) model.add(Activation('relu')) model.add(Dense(20) ) model.add(Activation('relu')) model.add(Dense(1)) model.compile(loss='mse', optimizer='adam') I also test this model, and it should converge even faster with a comparable MSE.