Unable to train neural network properly - python

I am trying to train a Neural Network(NN) implemented through Keras to implement the following function.
y(n) = y(n-1)*0.9 + x(n)*0.1
So the idea is to have a signal as train_x data and pass through the above function to get a train_y data, giving us a (train_x, train_y) training data.
import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
train_x = np.concatenate((np.ones(100)*120,np.ones(150)*150,np.ones(150)*90,np.ones(100)*110), axis=None)
train_y = np.ones(train_x.size)*train_x[0]
alpha = 0.9
for i in range(train_x.size):
train_y[i] = train_y[i-1]*alpha + train_x[i]*(1 - alpha)
train_x data vs train_y data plot
The function under question y(n) is a low pass function and makes the x(n) value to not change abruptly, as shown in the plot.
Then I make a NN and fit it with (train_x, train_y) and plot the
model = Sequential()
model.add(Dense(128, kernel_initializer='normal', input_dim=1, activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='linear'))
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(train_x, train_y, epochs=200, verbose=0)
print(history.history['loss'][-1])
plt.plot(history.history['loss'])
plt.show()
loss_plot_200_epoch
And the final loss value is approximately 2.9, which I thought was pretty good. But then the accuracy plot was like this
accuracy_plot_200_epochs
So when I check the prediction of the neural network over the data it was trained on
plt.plot(model.predict(train_x))
plt.plot(train_x)
plt.show()
train_x_vs_predict_x
The values have just offsetted by a little and that's all. I tried changing the activation functions, number of neurons and layers but the result still is the same. What am I doing wrong?
---- Edit ----
Made the NN to accept 2 dimensional input and it works as intended
import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
train_x = np.concatenate((np.ones(100)*120,np.ones(150)*150,np.ones(150)*90,np.ones(100)*110), axis=None)
train_y = np.ones(train_x.size)*train_x[0]
alpha = 0.9
for i in range(train_x.size):
train_y[i] = train_y[i-1]*alpha + train_x[i]*(1 - alpha)
train = np.empty((500,2))
for i in range(500):
train[i][0]=train_x[i]
train[i][1]=train_y[i]
model = Sequential()
model.add(Dense(128, kernel_initializer='normal', input_dim=2, activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='linear'))
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(train, train_y, epochs=100, verbose=0)
print(history.history['loss'][-1])
plt.plot(history.history['loss'])
plt.show()

If I execute your code, I get the following plot for the X-Y values:
If I didn't miss something important here and you really feed that to your neural net, you probably can't expect better results. The reason is, that a neural net is just a function that can only calculate one output vector for one input. In your case the output vector would consist of only one element (your y value), but as you can see in the diagram above, for x=90 there is not just one single output. So what you feed to your neural net, cannot really be calculated as a function and so most likely the network tries to calculate the straight line between point ~(90, 145) and ~(150, 150). I mean, the "upper line" in the diagram.

The neural network you're building is a simple multi-layer perceptron with one input node and one output node. This means that it is essentially a function that accepts one real number and returns one real number -- context is not passed in and can therefore not be considered. The expression
model.predict(train_x)
Does not evaluate a vector-to-vector function for the vector train_x but evaluates a number-to-number function for every number in train_x, then returns the list of results. This is why you get flat segments in the train_x_vs_predict_x plot: the same input numbers produce the same output numbers every time.
Given this constraint, the approximation is actually quite good. For example, the network has seen for x values of 150 many y values of 150 and a few lower ones but never anything above 150. So, given an x of 150, it predicts a y value of slightly lower than 150.
The function you wanted, on the other hand, refers to the previous function value and will need information about this in its input. If what you're trying to build is a function that accepts a sequence of real numbers and returns a sequence of real numbers, you could do that with a many-to-many recurrent network (and you're going to need a lot more training data than one example sequence), but since you can calculate the function directly, why bother with neural networks at all? There's no need to whip out the chainsaw where a butter knife will do.

Related

Prediction Interval for Neural Net in Python

I'm currently using keras to create a neural net in python. I have a basic model and the code looks like this:
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(23, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(500, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation="relu"))
model.compile(loss='mean_squared_error', optimizer='adam')
It works well and gives me good predictions for my use case. However, I would like to be able to use a Variational Gaussian Process layer to give me an estimate for the prediction interval as well. I'm new to this type of layer and am struggling a bit to implement it. The tensorflow documentation on it can be found here:
https://www.tensorflow.org/probability/api_docs/python/tfp/layers/VariationalGaussianProcess
However, I'm not seeing that same layer in the keras library. For further reference, I'm trying to do something similar to what was done in this article:
https://blog.tensorflow.org/2019/03/regression-with-probabilistic-layers-in.html
There seems to be a bit more complexity when you have 23 inputs vs one that I'm not understanding. I'm also open to other methods to achieving the target objective. Any examples on how to do this or insights on other approaches would be greatly appreciated!
tensorflow_probability is a separate library but suitable to use with Keras and TensorFlow. You can add those custom layers in your code and change it to a probabilistic model. If your goal is just to get a prediction interval it would be simpler to use the DistributionLambda layer. So your code would be as follows:
from keras.layers import Dense
from keras.models import Sequential
from sklearn.datasets import make_regression
import tensorflow_probability as tfp
import tensorflow as tf
tfd = tfp.distributions
# Sample data
X, y = make_regression(n_samples=100, n_features=23, noise=4.0, bias=15)
# loss function Negative log likelyhood
negloglik = lambda y, p_y: -p_y.log_prob(y)
# Model
model = Sequential()
model.add(Dense(23, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(500, kernel_initializer='normal', activation='relu'))
model.add(Dense(2))
model.add(tfp.layers.DistributionLambda(
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[..., 1:]))))
model.compile(loss=negloglik, optimizer='adam')
model.fit(X,y, epochs=250, verbose=None)
After training your model, you can get your prediction distribution with the following lines:
yhat = model(X) # make predictions
means = yhat.mean() # prediction means
stds = yhat.stddev() # prediction standard deviation

how to improve f1 score for a imbalanced multiclass classification problem, tried using smote but it is giving bad results?

Dataset: train.csv
Approach
I have four classes to be predicted and they are really very imbalanced so i tried using SMOTE and a feed forward network but using smote is giving very poor results as compared to original dataset on the test data
model architecture
#model architecture
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout, Flatten
model = tf.keras.Sequential()
model.add(Dense(512, activation='relu', input_shape=(7, )))
model.add(BatchNormalization())
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(64, activation='relu'))
model.add(Dense(4, activation='softmax'))
earlystopping = tf.keras.callbacks.EarlyStopping(
monitor="val_loss",
patience=40,
mode="auto",
restore_best_weights=True,
)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
So how to approach for this problem and increase the f1-score on the test dataset
Any help is appreciated
Below is an explanation of what could be the best approach for your case.
SMOTE
Usually SMOTE balances out the data by random upsampling, so even if you have a data sample distribution like Class A having 15000 Records and Class B having 200 records it would upsample the Class B to 15000 Records too.
Having too many random samples generated from the 200 Records it self sometimes makes the model very hard to learn and differentiate between classes, since the upsampling has significantly increased Class B records from 200 to 15000 by duplicating it.
Possible Solutions
Instead of SMOTE I would recommend to try Stratified Sampling between the train/test and then try building the model on top of it.
Having class weights as parameter is another best approach and its present almost for all ML algorithms. In your case for Keras you can Refer Here it could be very helpful.

Getting a prediction from Keras with a 1-D array

I have the following code:
from numpy import loadtxt
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from time import sleep
dataset = loadtxt('dataset.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
model = Sequential()
model.add(Dense(192, input_dim=8, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=600, batch_size=10)
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
When I run it, it trains no problem, and most of the time I even get 100% accuracy, but I'm having trouble getting predictions from the model. As you can see by the following sample of the training data, the first 8 entries are inputs, and a 1 or 0 is the out put.
6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1
5,116,74,0,0,25.6,0.201,30,0
3,78,50,32,88,31.0,0.248,26,1
10,115,0,0,0,35.3,0.134,29,0
2,197,70,45,543,30.5,0.158,53,1
8,125,96,0,0,0.0,0.232,54,1
4,110,92,0,0,37.6,0.191,30,0
10,168,74,0,0,38.0,0.537,34,1
10,139,80,0,0,27.1,1.441,57,0
1,189,60,23,846,30.1,0.398,59,1
5,166,72,19,175,25.8,0.587,51,1
7,100,0,0,0,30.0,0.484,32,1
0,118,84,47,230,45.8,0.551,31,1
7,107,74,0,0,29.6,0.254,31,1
What I want to enter "6,148,72,35,0,33.6,0.627,50" into the code and have the model give me an output based on that. What should I do?
Alright, that was fast, but it occurred to me that I just needed to add another list define around the one that was already there, making it a 2D array, and allowing Keras to make a prediction.

Why am I getting horizontal line (almost zero) from neural network instead of the desired curve?

I am trying to use neural network for my regression problem in python but the output of the neural network is a straight horizontal line which is zero. I have one input and obviously one output.
Here is my code:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='normal', activation='relu'))
model.add(Dense(4, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error',metrics=['mse'], optimizer='adam')
model.summary()
return model
# evaluate model
estimator = KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=64,validation_split = 0.2, verbose=1)
kfold = KFold(n_splits=10)
results = cross_val_score(estimator, X_train, y_train, cv=kfold)
Here are the plots of NN prediction vs. target for both training and test data.
Training Data
Test Data
I have also tried different weight initializers (Xavier and He) with no luck!
I really appreciate your help
First of all correct your syntax while adding dense layers in model remove the double equal == with single equal = with kernal_initilizer like below
model.add(Dense(1, input_dim=1, kernel_initializer ='normal', activation='relu'))
Then to make the performance better do the followong
Increase the number of hidden neurons in the hidden layers
Increase the number of hidden layers.
If still you have same problem then try to change the optimizer and activation function. Tuning the hyperparameters may help you in converging to the solution
EDIT 1
You also have to fit the estimator after cross validation like below
estimator.fit(X_train, y_train)
and then you can test on the test data as follow
prediction = estimator.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(Y_test, prediction)

Why do I fail to predict a linear equation (Y=2*x) with Keras?

I tried to predict a linear equation (Y=2*x) with Keras, but there it failed.
With a sigmoid activation function I get rectangular predictions, with ReLu I get NaNĀ“s.
What is the cause? How could I change the code to predict y=2*x.
import numpy as np
from keras.layers import Dense, Activation
from keras.models import Sequential
import matplotlib.pyplot as plt
import math
import time
x = np.arange(-100, 100, 0.5)
y = x*2
model = Sequential()
model.add(Dense(10, input_shape=(1,)))
model.add(Activation('sigmoid'))
model.add(Dense(20) )
model.add(Activation('sigmoid'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='SGD', metrics=['mean_squared_error'])
t1 = time.clock()
for i in range(40):
model.fit(x, y, epochs=1000, batch_size=len(x), verbose=0)
predictions = model.predict(x)
print (i," ", np.mean(np.square(predictions - y))," t: ", time.clock()-t1)
plt.hold(False)
plt.plot(x, y, 'b', x, predictions, 'r--')
plt.hold(True)
plt.ylabel('Y / Predicted Value')
plt.xlabel('X Value')
plt.title([str(i)," Loss: ",np.mean(np.square(predictions - y))," t: ", str(time.clock()-t1)])
plt.pause(0.001)
#plt.savefig("fig2.png")
plt.show()
Although it seems that the default learning rate may be inappropriate at the first glance, the real problem here is that sigmoid activation is inappropriate.
Why? Because your desired output should NOT be bounded, but using sigmoid implies a bounded output. To be more precise, your last layer computes an output y as
y=\sum_i{w_i*x_i} + b
while x_i here is the output of the second last layer, which is activated by sigmoid, indicating that x_i \in [0,1]. Because of this reason, your output y is bounded as y \in [-V+b,+V+b], where V=|w_0|+|w_1|+...+|w_19|, also known as the L1norm of the weight matrix, i.e. V=L1norm(W).
Since the weight matrix W will be learned based on your training data, it is safe to conclude that your model will NOT be generalizable to those testing data, whose value is outside of the range ( min(x_train), max(x_train) ).
How to fix?
Thought 1: for this simple problem, you actually don't need any nonlinearity. Simply use a linear MLP as follows.
model = Sequential()
model.add(Dense(1, input_shape=(1,)))
model.compile(loss='mse', optimizer='adam')
I tested it, and it should converge in 200 epochs with an MSE around 1e-5.
Thought 2: use a different activation function that does not suffer the bounded output issue, e.g. relu (note: tanh is also inappropriate for the same reason).
model = Sequential()
model.add(Dense(10, input_shape=(1,)))
model.add(Activation('relu'))
model.add(Dense(20) )
model.add(Activation('relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
I also test this model, and it should converge even faster with a comparable MSE.

Categories